:wave: hi folks, Looking to understand how to get ...
# troubleshooting
n
👋 hi folks, Looking to understand how to get the pinot ingestion job working on EMR spark 2.4 in cluster mode. Using pinot 0.7.1 since the EMR cluster I'm working with is running on java 8. The following spark-submit works successfully and the pinot segments are getting generated when running in client mode. Here the command to start it on the master node
Copy code
sudo spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master local --deploy-mode client --conf spark.local.dir=/mnt --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar" /mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar -jobSpecFile /mnt/pinot/spark_job_spec_v8.yaml
the ingestion spec used is this:
Copy code
executionFrameworkSpec:
  name: 'spark'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'
  extraConfigs:
    stagingDir: <s3://nikhil-dw-dev/pinot/staging/>
    dependencyJarDir: '<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins>'  
jobType: SegmentCreation
inputDirURI: '<s3://nikhil-dw-dev/pinot/pinot_input/>'
includeFileNamePattern: 'glob:**/*.parquet'
outputDirURI: '<s3://nikhil-dw-dev/pinot/pinot_output3/>'
overwriteOutput: true
pinotFSSpecs:
  -
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    scheme: s3
    configs:
      region: us-east-1
recordReaderSpec:
  dataFormat: 'parquet'
  className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
  tableName: 'students'
  schemaURI: '<s3://nikhil-dw-dev/pinot/students_schema.json>'
  tableConfigURI: '<s3://nikhil-dw-dev/pinot/students_table.json>'
But when running this on cluster mode, I get the class not found issue. The plugins.dir is available on all the EMR nodes, and we can see that the plugins are getting successfully loaded., I have tried passing the the s3 location as well as the /mnt path, and both are failing with the same error. I looked at these two previous posts [1] and [2] and they did not help in resolving it. Here is the error
Copy code
22/04/14 07:06:44 INFO PluginManager: Plugins root dir is [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugins: [[pinot-s3, pinot-parquet]]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugin [pinot-s3] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/14 07:06:44 INFO PluginManager: Successfully loaded plugin [pinot-s3] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar]
22/04/14 07:06:44 INFO PluginManager: Successfully Loaded plugin [pinot-s3] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugin [pinot-parquet] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/14 07:06:44 INFO PluginManager: Successfully loaded plugin [pinot-parquet] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar]
22/04/14 07:06:44 INFO PluginManager: Successfully Loaded plugin [pinot-parquet] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/14 07:06:45 ERROR LaunchDataIngestionJobCommand: Got exception to generate IngestionJobSpec for data ingestion job - 
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
 in 'string', line 1, column 1:
    executionFrameworkSpec:
    ^
Will thread the different commands used to submit this job. Thank you for your help 🙇
this command with the jars pointing to lcoal path
Copy code
sudo spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --deploy-mode cluster --jars /mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar,/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar,/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar,/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar --files "/mnt/pinot/spark_job_spec_v8.yaml" --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar" --conf "spark.executor.extraClassPath=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar" <s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar> -jobSpecFile spark_job_spec_v8.yaml
and this command point to jars on s3
Copy code
sudo spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --deploy-mode cluster --jars <s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar,s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar,s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar,s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar> --files "/mnt/pinot/spark_job_spec_v8.yaml" --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/conf/pinot-ingestion-job-log4j2.xml>" --conf "spark.driver.extraClassPath=<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar>" --conf "spark.executor.extraClassPath=<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar>" <s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar> -jobSpecFile spark_job_spec_v8.yaml
both are failing
i tried including the jars in both driver.extraClassPath and executor.extraClassPath. neither helped
x
Can you include a bit text after the exception log below? It may help debugging
Copy code
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
 in 'string', line 1, column 1:
    executionFrameworkSpec:
Let’s try the latest release since 0.7.1 is very old. There’s command in the PR to build by yourself https://github.com/apache/pinot/pull/6424
mvn clean install -DskipTests -Pbin-dist -T 4  -Djdk.version=8
n
@User thank you for replying back. here is more of the logs from the spark job run
Copy code
22/04/19 02:36:04 INFO PluginManager: Plugins root dir is [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins]
22/04/19 02:36:04 INFO PluginManager: Trying to load plugins: [[pinot-s3, pinot-parquet]]
22/04/19 02:36:04 INFO PluginManager: Trying to load plugin [pinot-s3] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/19 02:36:04 INFO PluginManager: Successfully loaded plugin [pinot-s3] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar]
22/04/19 02:36:04 INFO PluginManager: Successfully Loaded plugin [pinot-s3] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/19 02:36:04 INFO PluginManager: Trying to load plugin [pinot-parquet] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/19 02:36:04 INFO PluginManager: Successfully loaded plugin [pinot-parquet] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar]
22/04/19 02:36:04 INFO PluginManager: Successfully Loaded plugin [pinot-parquet] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/19 02:36:06 ERROR LaunchDataIngestionJobCommand: Got exception to generate IngestionJobSpec for data ingestion job - 
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
 in 'string', line 1, column 1:
    executionFrameworkSpec:
    ^

	at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)
	at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
	at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
	at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
	at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
	at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:427)
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.getSegmentGenerationJobSpec(IngestionJobLauncher.java:90)
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:118)
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:67)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)
Caused by: org.yaml.snakeyaml.error.YAMLException: Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
	at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:650)
	at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:331)
	at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:341)
	... 13 more
22/04/19 02:36:06 ERROR LaunchDataIngestionJobCommand: Exception caught: 
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
 in 'string', line 1, column 1:
    executionFrameworkSpec:
    ^

	at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)
	at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
	at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
	at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
	at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
	at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:427)
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.getSegmentGenerationJobSpec(IngestionJobLauncher.java:90)
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:118)
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:67)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)
Caused by: org.yaml.snakeyaml.error.YAMLException: Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
	at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:650)
	at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:331)
	at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:341)
	... 13 more
22/04/19 02:36:06 ERROR ApplicationMaster: Uncaught exception: 
java.lang.IllegalStateException: User did not initialize spark context!
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:489)
	at <http://org.apache.spark.deploy.yarn.ApplicationMaster.org|org.apache.spark.deploy.yarn.ApplicationMaster.org>$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:308)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:248)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:783)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:782)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:807)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
22/04/19 02:36:06 INFO ShutdownHookManager: Shutdown hook called
i'm gonna build pinot 0.10.0 with java 8 and try out next