I am running below spark command in cluster mode…i...
# troubleshooting
s
I am running below spark command in cluster mode…its taking too long in last step to copy files from staging to output directory and it is doing one file at a time.. any suggestion on how to improve the performance as for 8000 files it taking more than 10 hours just in last step from staging to output directory.. spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master yarn --deploy-mode cluster --conf spark.yarn.am.waitTime=1000s --conf spark.sql.parquet.fs.optimized.committer.optimization-enabled=true --conf parquet.enable.summary-metadata=false --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --conf spark.sql.hive.convertMetastoreParquet.mergeSchema=false --conf spark.sql.shuffle.partitions=2000 --conf “spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins” --conf “spark.driver.extraClassPath=pinot-batch-ingestion-spark-2.4-${PINOT_VERSION}-SNAPSHOT-shaded.jar:pinot-all-${PINOT_VERSION}-SNAPSHOT-jar-with-dependencies.jar:pinot-s3-${PINOT_VERSION}-SNAPSHOT-shaded.jar:pinot-parquet-${PINOT_VERSION}-SNAPSHOT-shaded.jar” --conf “spark.executor.extraClassPath=pinot-batch-ingestion-spark-2.4-${PINOT_VERSION}-SNAPSHOT-shaded.jar:pinot-all-${PINOT_VERSION}-SNAPSHOT-jar-with-dependencies.jar:pinot-s3-${PINOT_VERSION}-SNAPSHOT-shaded.jar:pinot-parquet-${PINOT_VERSION}-SNAPSHOT-shaded.jar” --jars “${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-SNAPSHOT-jar-with-dependencies.jar,${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pinot-batch-ingestion-spark-2.4-${PINOT_VERSION}-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-SNAPSHOT-shaded.jar” --files s3://roku-dea-dev/sand-box/suraj/spark_job_spec_offlinebookingnarrow_perf.yaml local://pinot-all-${PINOT_VERSION}-SNAPSHOT-jar-with-dependencies.jar -jobSpecFile spark_job_spec_offlinebookingnarrow_perf.yaml
m
@Kartik Khare
k
Hi, it seems like you are running the job on default 2 spark executors. You can increase that. Secondly, 10 hours is way too long, Can you check the logs or paste them here so that we can know it is simply not running forever and re-trying some errored code.
s
hi @Kartik Khare there were 7000 executors.. but i also gave --conf spark.executor.instances=10000.. still same issue.. it is copying files from staging to output directory one by one file.. i can see it update in target directory.. that copy step is time consuming else processing from source to staging takes 15 mins.. i am not sure where the job is that is copying the files from staging to batch_output folder to provide logs.. as seen its 5.7 hours and still running
k
Let's move this discussion to DM