Hey folks wave I m having some issues with `spark` Batch Ing Apache Pinot #troubleshooting

Hey folks :wave:, I'm having some issues with `spa...

Scott deRegt

05/27/2022, 3:57 PM

Hey folks 👋, I'm having some issues with

spark

Batch Ingestion job when moving from

--master local --deploy-mode client

--master yarn --deploy-mode cluster

(as suggested here for production environments). I would greatly appreciate some guidance from others who have successfully configured this spark job. Details in thread 🧵

✅ 1

Scott deRegt

05/27/2022, 3:59 PM

Following this thread, I am able to successfully

spark-submit

locally using the following command:

Copy code

sudo spark-submit --verbose \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master local --deploy-mode client \
--conf spark.local.dir=/mnt \
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/conf/pinot-ingestion-job-log4j2.xml" \
--conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar \
-jobSpecFile /mnt/pinot/daily_channel_user_metrics_20220502.yaml

Scott deRegt

05/27/2022, 4:00 PM

The problem occurs when switching to

--master yarn

, whether using

--deploy-mode client

--deploy-mode cluster

Scott deRegt

05/27/2022, 4:01 PM

To rule out issues with

yarn

being misconfigured on my EMR cluster, I successfully ran this example:

Copy code

spark-submit --master yarn --deploy-mode cluster --class "org.apache.spark.examples.JavaSparkPi" /usr/lib/spark/examples/jars/spark-examples.jar

Scott deRegt

05/27/2022, 4:05 PM

My current WIP command is this:

Copy code

sudo spark-submit --verbose \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master yarn --deploy-mode cluster \
--conf spark.local.dir=/mnt \
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/conf/pinot-ingestion-job-log4j2.xml" \
--conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
--conf "spark.executor.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/conf/pinot-ingestion-job-log4j2.xml" \
--conf "spark.executor.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
--files /mnt/pinot/daily_channel_user_metrics_20220502.yaml \
/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar \
-jobSpecFile daily_channel_user_metrics_20220502.yaml

which gets stuck when trying to add executor tasks, complaining that

ApplicationMaster

has not yet registered:

Copy code

2022/05/27 16:03:34.967 INFO [DAGScheduler] [dag-scheduler-event-loop] Submitting 1000 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at SparkSegmentGenerationJobRunner.java:237) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
2022/05/27 16:03:34.968 INFO [YarnClusterScheduler] [dag-scheduler-event-loop] Adding task set 0.0 with 1000 tasks
2022/05/27 16:03:39.866 WARN [YarnSchedulerBackend$YarnSchedulerEndpoint] [dispatcher-event-loop-2] Attempted to request executors before the AM has registered!
2022/05/27 16:03:39.867 WARN [ExecutorAllocationManager] [spark-dynamic-executor-allocation] Unable to reach the cluster manager to request 11 total executors!

It eventually times out and fails.

Scott deRegt

05/27/2022, 4:16 PM

I'm thinking it is most likely an issue with providing all dependencies/jars, but the log messages I'm seeing have not been super helpful. I'm not seeing any obvious error messages related to

java.lang.ClassNotFoundException

of pinot libs. Unclear to me how the Driver seems to be able to execute a portion of class main (lists s3 files and tries to start tasks) yet ApplicationMaster seems to fail to boot and register properly.

Xiang Fu

05/27/2022, 5:57 PM

The current suggest way is to copy Pinot jars to the spark class path if you can add jars when creating a Spark cluster. As the executor worker node may not have the corresponding jars when you submit the job

👀 1

Xiang Fu

05/27/2022, 5:57 PM

Or you can build a fat jar contains the necessary dependencies, that will also help

Kartik Khare

05/27/2022, 7:29 PM

Hi Just mention all the jars in

spark.driver.extraClassPath

with

--jars

argument as well. That will solve the issue.

👀 1

Kartik Khare

05/27/2022, 7:30 PM

Also remove

spark.driver.extraJavaOptions=

Scott deRegt

05/27/2022, 7:34 PM

Thanks for the responses 🙇, will give it a try

Scott deRegt

05/27/2022, 7:48 PM

When removing

spark.driver.extraJavaOptions

, I appear to lose stdout (I think due to the log4j config file). stderr is logging this with new attempt:

Copy code

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.records.Resource.newInstance(JJII)Lorg/apache/hadoop/yarn/api/records/Resource;
	at org.apache.spark.deploy.yarn.YarnAllocator.<init>(YarnAllocator.scala:153)
	at org.apache.spark.deploy.yarn.YarnRMClient.createAllocator(YarnRMClient.scala:84)
	at org.apache.spark.deploy.yarn.ApplicationMaster.createAllocator(ApplicationMaster.scala:438)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:485)
	at <http://org.apache.spark.deploy.yarn.ApplicationMaster.org|org.apache.spark.deploy.yarn.ApplicationMaster.org>$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:308)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:248)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:783)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:782)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:807)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

Scott deRegt

05/27/2022, 7:51 PM

passed in the jars like so:

Copy code

sudo spark-submit --verbose \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master yarn --deploy-mode cluster \
--conf spark.local.dir=/mnt \
--conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
--conf "spark.executor.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar" \
--files /mnt/pinot/daily_channel_user_metrics_20220502.yaml \
--jars /mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar,/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar,/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar,/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar \
/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar \
-jobSpecFile daily_channel_user_metrics_20220502.yaml

Kartik Khare

05/27/2022, 7:52 PM

Seems like HADOOP_CLASSPATH is not set.

👀 1

Scott deRegt

05/27/2022, 8:21 PM

Not seeing

HADOOP_CLASSPATH

in logs here, so this seems plausible:

Copy code

YARN executor launch context:
  env:
    CLASSPATH -> ...
    SPARK_YARN_CONTAINER_CORES -> ...
    SPARK_DIST_CLASSPATH -> ...
    SPARK_YARN_STAGING_DIR -> ...
    SPARK_USER -> ...
    JAVA_HOME -> ...
    SPARK_PUBLIC_DNS -> ...

Scott deRegt

05/27/2022, 8:26 PM

I do see that var being set in

/etc/hadoop/conf/hadoop-env.sh

though.

Scott deRegt

05/27/2022, 8:41 PM

Running

hadoop classpath

from emr master node (where I am submitting the application from) is giving me this result:

Copy code

$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/etc/tez/conf:/usr/lib/tez/*:/usr/lib/tez/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*

Scott deRegt

05/29/2022, 4:22 AM

To close the loop in case anyone is referencing this thread, @~Kartik was able to identify the issue. Using EMR 5.34 with spark/hadoop versions:

Copy code

spark version 2.4.8-amzn-0
Hadoop 2.10.1-amzn-2

His guidance on a fix:

so these unexpected hadoop deps might be coming from
pinot-parquet
plugin (although that uses hadoop 2.10.1 which is what the EMR cluster is on)

Anyways, here's what you can do

• open pom.xml in

pinot -> pinot-plugins -> pinot-input-format -> pinot-parquet

• change the dependency scope of
hadoop-common
and
hadoop-mapreduce-client-core
from
compile
to
provided

• recompile the code

Kartik Khare

05/29/2022, 4:24 AM

I have also raise a PR so that manual intervention is not needed for this. https://github.com/apache/pinot/pull/8798

🙌 1

🙇 1

Open in Slack

Previous Next