Hey folks :wave:, I'm having some issues with `spa...
# troubleshooting
s
Hey folks ๐Ÿ‘‹, I'm having some issues with
spark
Batch Ingestion job when moving from
--master local --deploy-mode client
to
--master yarn --deploy-mode cluster
(as suggested here for production environments). I would greatly appreciate some guidance from others who have successfully configured this spark job. Details in thread ๐Ÿงต
โœ… 1
Following this thread, I am able to successfully
spark-submit
locally using the following command:
Copy code
sudo spark-submit --verbose \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master local --deploy-mode client \
--conf spark.local.dir=/mnt \
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/conf/pinot-ingestion-job-log4j2.xml" \
--conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar \
-jobSpecFile /mnt/pinot/daily_channel_user_metrics_20220502.yaml
The problem occurs when switching to
--master yarn
, whether using
--deploy-mode client
or
--deploy-mode cluster
.
To rule out issues with
yarn
being misconfigured on my EMR cluster, I successfully ran this example:
Copy code
spark-submit --master yarn --deploy-mode cluster --class "org.apache.spark.examples.JavaSparkPi" /usr/lib/spark/examples/jars/spark-examples.jar
My current WIP command is this:
Copy code
sudo spark-submit --verbose \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master yarn --deploy-mode cluster \
--conf spark.local.dir=/mnt \
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/conf/pinot-ingestion-job-log4j2.xml" \
--conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
--conf "spark.executor.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/conf/pinot-ingestion-job-log4j2.xml" \
--conf "spark.executor.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
--files /mnt/pinot/daily_channel_user_metrics_20220502.yaml \
/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar \
-jobSpecFile daily_channel_user_metrics_20220502.yaml
which gets stuck when trying to add executor tasks, complaining that
ApplicationMaster
has not yet registered:
Copy code
2022/05/27 16:03:34.967 INFO [DAGScheduler] [dag-scheduler-event-loop] Submitting 1000 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at SparkSegmentGenerationJobRunner.java:237) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
2022/05/27 16:03:34.968 INFO [YarnClusterScheduler] [dag-scheduler-event-loop] Adding task set 0.0 with 1000 tasks
2022/05/27 16:03:39.866 WARN [YarnSchedulerBackend$YarnSchedulerEndpoint] [dispatcher-event-loop-2] Attempted to request executors before the AM has registered!
2022/05/27 16:03:39.867 WARN [ExecutorAllocationManager] [spark-dynamic-executor-allocation] Unable to reach the cluster manager to request 11 total executors!
It eventually times out and fails.
I'm thinking it is most likely an issue with providing all dependencies/jars, but the log messages I'm seeing have not been super helpful. I'm not seeing any obvious error messages related to
java.lang.ClassNotFoundException
of pinot libs. Unclear to me how the Driver seems to be able to execute a portion of class main (lists s3 files and tries to start tasks) yet ApplicationMaster seems to fail to boot and register properly.
x
The current suggest way is to copy Pinot jars to the spark class path if you can add jars when creating a Spark cluster. As the executor worker node may not have the corresponding jars when you submit the job
๐Ÿ‘€ 1
Or you can build a fat jar contains the necessary dependencies, that will also help
k
Hi Just mention all the jars in
spark.driver.extraClassPath
with
--jars
argument as well. That will solve the issue.
๐Ÿ‘€ 1
Also remove
spark.driver.extraJavaOptions=
s
Thanks for the responses ๐Ÿ™‡, will give it a try
When removing
spark.driver.extraJavaOptions
, I appear to lose stdout (I think due to the log4j config file). stderr is logging this with new attempt:
Copy code
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.records.Resource.newInstance(JJII)Lorg/apache/hadoop/yarn/api/records/Resource;
	at org.apache.spark.deploy.yarn.YarnAllocator.<init>(YarnAllocator.scala:153)
	at org.apache.spark.deploy.yarn.YarnRMClient.createAllocator(YarnRMClient.scala:84)
	at org.apache.spark.deploy.yarn.ApplicationMaster.createAllocator(ApplicationMaster.scala:438)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:485)
	at <http://org.apache.spark.deploy.yarn.ApplicationMaster.org|org.apache.spark.deploy.yarn.ApplicationMaster.org>$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:308)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:248)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:783)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:782)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:807)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
passed in the jars like so:
Copy code
sudo spark-submit --verbose \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master yarn --deploy-mode cluster \
--conf spark.local.dir=/mnt \
--conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
--conf "spark.executor.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
--files /mnt/pinot/daily_channel_user_metrics_20220502.yaml \
--jars /mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar,/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar,/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar,/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar \
/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar \
-jobSpecFile daily_channel_user_metrics_20220502.yaml
k
Seems like HADOOP_CLASSPATH is not set.
๐Ÿ‘€ 1
s
Not seeing
HADOOP_CLASSPATH
in logs here, so this seems plausible:
Copy code
YARN executor launch context:
  env:
    CLASSPATH -> ...
    SPARK_YARN_CONTAINER_CORES -> ...
    SPARK_DIST_CLASSPATH -> ...
    SPARK_YARN_STAGING_DIR -> ...
    SPARK_USER -> ...
    JAVA_HOME -> ...
    SPARK_PUBLIC_DNS -> ...
I do see that var being set in
/etc/hadoop/conf/hadoop-env.sh
though.
Running
hadoop classpath
from emr master node (where I am submitting the application from) is giving me this result:
Copy code
$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/etc/tez/conf:/usr/lib/tez/*:/usr/lib/tez/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
To close the loop in case anyone is referencing this thread, @~Kartik was able to identify the issue. Using EMR 5.34 with spark/hadoop versions:
Copy code
spark version 2.4.8-amzn-0
Hadoop 2.10.1-amzn-2
His guidance on a fix:
so these unexpected hadoop deps might be coming from
pinot-parquet
plugin (although that uses hadoop 2.10.1 which is what the EMR cluster is on)
Anyways, here's what you can do
โ€ข open pom.xml in
pinot -> pinot-plugins -> pinot-input-format -> pinot-parquet
โ€ข change the dependency scope of
hadoop-common
and
hadoop-mapreduce-client-core
from
compile
to
provided
โ€ข recompile the code
k
I have also raise a PR so that manual intervention is not needed for this. https://github.com/apache/pinot/pull/8798
๐Ÿ™Œ 1
๐Ÿ™‡ 1