Nikhil
04/14/2022, 8:34 PMsudo spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master local --deploy-mode client --conf spark.local.dir=/mnt --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar" /mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar -jobSpecFile /mnt/pinot/spark_job_spec_v8.yaml
the ingestion spec used is this:
executionFrameworkSpec:
name: 'spark'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'
extraConfigs:
stagingDir: <s3://nikhil-dw-dev/pinot/staging/>
dependencyJarDir: '<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins>'
jobType: SegmentCreation
inputDirURI: '<s3://nikhil-dw-dev/pinot/pinot_input/>'
includeFileNamePattern: 'glob:**/*.parquet'
outputDirURI: '<s3://nikhil-dw-dev/pinot/pinot_output3/>'
overwriteOutput: true
pinotFSSpecs:
-
className: org.apache.pinot.plugin.filesystem.S3PinotFS
scheme: s3
configs:
region: us-east-1
recordReaderSpec:
dataFormat: 'parquet'
className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
tableName: 'students'
schemaURI: '<s3://nikhil-dw-dev/pinot/students_schema.json>'
tableConfigURI: '<s3://nikhil-dw-dev/pinot/students_table.json>'
But when running this on cluster mode, I get the class not found issue. The plugins.dir is available on all the EMR nodes, and we can see that the plugins are getting successfully loaded., I have tried passing the the s3 location as well as the /mnt path, and both are failing with the same error. I looked at these two previous posts [1] and [2] and they did not help in resolving it.
Here is the error
22/04/14 07:06:44 INFO PluginManager: Plugins root dir is [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugins: [[pinot-s3, pinot-parquet]]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugin [pinot-s3] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/14 07:06:44 INFO PluginManager: Successfully loaded plugin [pinot-s3] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar]
22/04/14 07:06:44 INFO PluginManager: Successfully Loaded plugin [pinot-s3] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugin [pinot-parquet] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/14 07:06:44 INFO PluginManager: Successfully loaded plugin [pinot-parquet] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar]
22/04/14 07:06:44 INFO PluginManager: Successfully Loaded plugin [pinot-parquet] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/14 07:06:45 ERROR LaunchDataIngestionJobCommand: Got exception to generate IngestionJobSpec for data ingestion job -
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
in 'string', line 1, column 1:
executionFrameworkSpec:
^
Will thread the different commands used to submit this job.
Thank you for your help 🙇Nikhil
04/14/2022, 8:35 PMsudo spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --deploy-mode cluster --jars /mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar,/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar,/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar,/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar --files "/mnt/pinot/spark_job_spec_v8.yaml" --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar" --conf "spark.executor.extraClassPath=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar" <s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar> -jobSpecFile spark_job_spec_v8.yaml
and this command point to jars on s3
sudo spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --deploy-mode cluster --jars <s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar,s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar,s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar,s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar> --files "/mnt/pinot/spark_job_spec_v8.yaml" --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/conf/pinot-ingestion-job-log4j2.xml>" --conf "spark.driver.extraClassPath=<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar>" --conf "spark.executor.extraClassPath=<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar>" <s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar> -jobSpecFile spark_job_spec_v8.yaml
both are failingNikhil
04/14/2022, 8:36 PMXiaoman Dong
04/15/2022, 10:11 PMCan't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
in 'string', line 1, column 1:
executionFrameworkSpec:
Xiaoman Dong
04/15/2022, 10:16 PMXiaoman Dong
04/15/2022, 10:18 PMXiaoman Dong
04/15/2022, 10:18 PMmvn clean install -DskipTests -Pbin-dist -T 4 -Djdk.version=8
Nikhil
04/19/2022, 2:38 AM22/04/19 02:36:04 INFO PluginManager: Plugins root dir is [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins]
22/04/19 02:36:04 INFO PluginManager: Trying to load plugins: [[pinot-s3, pinot-parquet]]
22/04/19 02:36:04 INFO PluginManager: Trying to load plugin [pinot-s3] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/19 02:36:04 INFO PluginManager: Successfully loaded plugin [pinot-s3] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar]
22/04/19 02:36:04 INFO PluginManager: Successfully Loaded plugin [pinot-s3] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/19 02:36:04 INFO PluginManager: Trying to load plugin [pinot-parquet] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/19 02:36:04 INFO PluginManager: Successfully loaded plugin [pinot-parquet] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar]
22/04/19 02:36:04 INFO PluginManager: Successfully Loaded plugin [pinot-parquet] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/19 02:36:06 ERROR LaunchDataIngestionJobCommand: Got exception to generate IngestionJobSpec for data ingestion job -
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
in 'string', line 1, column 1:
executionFrameworkSpec:
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:427)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.getSegmentGenerationJobSpec(IngestionJobLauncher.java:90)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:118)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)
Caused by: org.yaml.snakeyaml.error.YAMLException: Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:650)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:331)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:341)
... 13 more
22/04/19 02:36:06 ERROR LaunchDataIngestionJobCommand: Exception caught:
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
in 'string', line 1, column 1:
executionFrameworkSpec:
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:427)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.getSegmentGenerationJobSpec(IngestionJobLauncher.java:90)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:118)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)
Caused by: org.yaml.snakeyaml.error.YAMLException: Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:650)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:331)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:341)
... 13 more
22/04/19 02:36:06 ERROR ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:489)
at <http://org.apache.spark.deploy.yarn.ApplicationMaster.org|org.apache.spark.deploy.yarn.ApplicationMaster.org>$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:308)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:248)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:783)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:782)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:807)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
22/04/19 02:36:06 INFO ShutdownHookManager: Shutdown hook called
i'm gonna build pinot 0.10.0 with java 8 and try out next