Nisheet
08/20/2021, 3:18 PMmvn install package -DskipTests -DskipiTs -Pbin-dist -Drat.ignoreErrors=true -Djdk.version=8 -Dspark.version=2.4.5
While running the task i getting the error
21/08/20 15:11:24 ERROR LaunchDataIngestionJobCommand: Exception caught:
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
in 'string', line 1, column 1:
executionFrameworkSpec:
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:349)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:427)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.getSegmentGenerationJobSpec(IngestionJobLauncher.java:94)
I have loaded all the jars in spark class path. Any idea how to resolve this??Nisheet
08/20/2021, 3:18 PMexecutionFrameworkSpec:
name: 'spark'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner'
extraConfigs:
stagingDir: 's3://<bucket>/<staging_path>/'
jobType: SegmentCreationAndTarPush
inputDirURI: 's3://<bucket>/<parquet_file_path>/'
includeFileNamePattern: 'glob:**/*.parquet'
outputDirURI: 's3://<bucket>/<path>/'
overwriteOutput: true
pinotFSSpecs:
- scheme: s3
className: org.apache.pinot.plugin.filesystem.S3PinotFS
configs:
region: '<region>'
recordReaderSpec:
dataFormat: 'parquet'
className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
tableName: '<pinot_table_name>'
pinotClusterSpecs:
- controllerURI: '<controller_uri>'
pushJobSpec:
pushParallelism: 2
pushAttempts: 2
pushRetryIntervalMillis: 1000
Nisheet
08/20/2021, 3:23 PMClass.forName("org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec")
using notebook and it ran the command successfully
Cluster spark conf:
spark.driver.extraClassPath=/home/yarn/pinot/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.8.0-shaded.jar:/home/yarn/pinot/lib/pinot-all-0.8.0-jar-with-dependencies.jar:/home/yarn/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.8.0-shaded.jar:/home/yarn/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.8.0-shaded.jar:/home/yarn/pinot/pinot-spi-0.8.0-SNAPSHOT.jar
spark.driver.extraJavaOptions=-Dplugins.dir=/home/yarn/pinot/plugins -Dplugins.include=pinot-s3,pinot-parquet
In logs i can see the jars loaded
21/08/20 15:10:25 INFO DriverCorral: Successfully attached library s3a://<jar_path>/pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar to Spark
.......
.......
21/08/20 15:11:21 WARN SparkContext: The jar /local_disk0/tmp/addedFile4953461200729207388pinot_all_0_8_0_SNAPSHOT_jar_with_dependencies-2dd68.jar has been added already. Overwriting of added jars is not supported in the current version.
And the plugins are also successfully loaded
21/08/20 15:11:23 INFO PluginManager: Plugins root dir is [/home/yarn/pinot/plugins]
21/08/20 15:11:23 INFO PluginManager: Trying to load plugins: [[pinot-s3, pinot-parquet]]
21/08/20 15:11:23 INFO PluginManager: Trying to load plugin [pinot-parquet] from location [/home/yarn/pinot/plugins/pinot-input-format/pinot-parquet]
21/08/20 15:11:23 INFO PluginManager: Successfully loaded plugin [pinot-parquet] from jar files: [file:/home/yarn/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.8.0-SNAPSHOT-shaded.jar]
21/08/20 15:11:23 INFO PluginManager: Successfully Loaded plugin [pinot-parquet] from dir [/home/yarn/pinot/plugins/pinot-input-format/pinot-parquet]
21/08/20 15:11:23 INFO PluginManager: Trying to load plugin [pinot-s3] from location [/home/yarn/pinot/plugins/pinot-file-system/pinot-s3]
21/08/20 15:11:23 INFO PluginManager: Successfully loaded plugin [pinot-s3] from jar files: [file:/home/yarn/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.8.0-SNAPSHOT-shaded.jar]
21/08/20 15:11:23 INFO PluginManager: Successfully Loaded plugin [pinot-s3] from dir [/home/yarn/pinot/plugins/pinot-file-system/pinot-s3]
Mayank
Nisheet
08/20/2021, 3:49 PMCaused by: org.yaml.snakeyaml.error.YAMLException: Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:660)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:335)
The line of code where it breaks also suggests the same
try {
cl = getClassForName(name);
} catch (ClassNotFoundException e) {
throw new YAMLException("Class not found: " + name);
}
protected Class<?> getClassForName(String name) throws ClassNotFoundException {
return Class.forName(name);
}
This doesn’t looks like a yaml file issue. I had cross verified yaml file and it looks fine. I have attached the yaml file for cross verificationNisheet
08/20/2021, 3:52 PM21/08/20 15:11:24 ERROR LaunchDataIngestionJobCommand: Got exception to generate IngestionJobSpec for data ingestion job -
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
in 'string', line 1, column 1:
executionFrameworkSpec:
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:349)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:427)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.getSegmentGenerationJobSpec(IngestionJobLauncher.java:94)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:125)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:74)
at lined4163e7e07a34bf996c1eea078dbedab25.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command--1:1)
at lined4163e7e07a34bf996c1eea078dbedab25.$read$$iw$$iw$$iw$$iw$$iw.<init>(command--1:44)
at lined4163e7e07a34bf996c1eea078dbedab25.$read$$iw$$iw$$iw$$iw.<init>(command--1:46)
at lined4163e7e07a34bf996c1eea078dbedab25.$read$$iw$$iw$$iw.<init>(command--1:48)
at lined4163e7e07a34bf996c1eea078dbedab25.$read$$iw$$iw.<init>(command--1:50)
at lined4163e7e07a34bf996c1eea078dbedab25.$read$$iw.<init>(command--1:52)
at lined4163e7e07a34bf996c1eea078dbedab25.$read.<init>(command--1:54)
at lined4163e7e07a34bf996c1eea078dbedab25.$read$.<init>(command--1:58)
at lined4163e7e07a34bf996c1eea078dbedab25.$read$.<clinit>(command--1)
at lined4163e7e07a34bf996c1eea078dbedab25.$eval$.$print$lzycompute(<notebook>:7)
at lined4163e7e07a34bf996c1eea078dbedab25.$eval$.$print(<notebook>:6)
at lined4163e7e07a34bf996c1eea078dbedab25.$eval.$print(<notebook>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572)
at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:396)
at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:373)
at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
at scala.util.Try$.apply(Try.scala:192)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:639)
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:485)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:597)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:390)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.yaml.snakeyaml.error.YAMLException: Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:660)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:335)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)
... 59 more
Neha Pawar
Nisheet
08/20/2021, 3:57 PMexecutionFrameworkSpec
. I have attached yaml file as well in above message for cross referenceBruce Ritchie
08/20/2021, 7:21 PMMayank
Kulbir Nijjer
08/20/2021, 8:01 PMdependencyJarDir
in extraConfigs, where you will need to first copy all the jars present in <PINOT_HOME>/plugins folder to S3
extraConfigs:
# stagingDir is used in distributed filesystem to host all the segments then move this directory entirely to output directory.
stagingDir: 's3://<somepath>/'
dependencyJarDir: 's3://<somePath>/plugins'
and that should address the CNFE. This is a known issue and we are debugging why jars are not getting deployed added to YARN nodes, so meanwhile above could be a short term workaround.Mayank
Nisheet
08/23/2021, 6:08 AM--jars
while spark submit and by providing plugin path in yaml file. I am still getting same
Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
Chethan UK
08/23/2021, 2:25 PMNisheet
08/23/2021, 3:27 PMKulbir Nijjer
08/23/2021, 3:45 PMNisheet
08/23/2021, 3:46 PMspark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --jars /home/yarn/pinot/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.8.0-SNAPSHOT-shaded.jar,/home/yarn/pinot/lib/pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar,/home/yarn/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.8.0-SNAPSHOT-shaded.jar,/home/yarn/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.8.0-SNAPSHOT-shaded.jar --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/home/yarn/pinot/plugins -Dplugins.include=pinot-s3,pinot-parquet" dbfs:/databricks/jars/pinot/lib/pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar -jobSpecFile /tmp/spec_file.yaml
Nisheet
08/23/2021, 5:34 PMspark-submit \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master "local[2]" \
--deploy-mode client \
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \
--jars "${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-SNAPSHOT-jar-with-dependencies.jar,${${PINOT_DISTRIBUTION_DIR}}/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.8.0-SNAPSHOT-shaded.jar" \
${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-SNAPSHOT-jar-with-dependencies.jar \
-jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/airlineStats/sparkIngestionJobSpec.yaml
Error message:
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
in 'string', line 21, column 1:
executionFrameworkSpec:
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
Kulbir Nijjer
08/23/2021, 9:16 PMKen Krugler
08/27/2021, 5:34 PM21/08/26 19:00:12 ERROR command.LaunchDataIngestionJobCommand: Got exception to kick off standalone data ingestion job -
java.lang.RuntimeException: Failed to create IngestionJobRunner instance for class - org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:139)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:103)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:143)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentGenerationJobRunner
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:77)
at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:294)
at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:265)
at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:246)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:137)
... 9 more
Then we defined HADOOP_CLASSPATH to include a bunch of Pinot jars…
export HADOOP_CLASSPATH=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-batch-ingestion/pinot-batch-ingestion-hadoop/pinot-batch-ingestion-hadoop-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-hdfs/pinot-hdfs-${PINOT_VERSION}-shaded.jar
After that it got past the initial job submission phase, but fails in all the Hadoop map tasks with:
21/08/26 19:08:35 INFO mapreduce.Job: Task Id : attempt_1629495469244_4781_m_000046_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.HadoopPinotFS
at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:56)
at org.apache.pinot.plugin.ingestion.batch.hadoop.HadoopSegmentCreationMapper.setup(HadoopSegmentCreationMapper.java:104)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.HadoopPinotFS
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:77)
at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:294)
at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:265)
at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:246)
at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:51)
... 9 more