hello everyone, im trying to run the spark batch i...
# general
s
hello everyone, im trying to run the spark batch ingestion job with spark-submit. while running the command, its not able to pickup the plugins and throwing as below.
Copy code
2021/07/15 00:07:42.306 ERROR [PluginManager] [main] Failed to load plugin [pinot-avro] from dir [/data_ssd/spark-retry/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-avro]
java.lang.IllegalArgumentException: object is not an instance of declaring class
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.pinot.spi.plugin.PluginClassLoader.<init>(PluginClassLoader.java:50) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.createClassLoader(PluginManager.java:196) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.load(PluginManager.java:187) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:157) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:123) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.<init>(PluginManager.java:104) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.<clinit>(PluginManager.java:46) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:54) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.11-2.4.6.jar:2.4.6]
	at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.11-2.4.6.jar:2.4.6]
2021/07/15 00:07:42.338 ERROR [PluginManager] [main] Failed to load plugin [pinot-batch-ingestion-spark] from dir [/data_ssd/spark-retry/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark]
java.lang.IllegalArgumentException: object is not an instance of declaring class
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.pinot.spi.plugin.PluginClassLoader.<init>(PluginClassLoader.java:50) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.createClassLoader(PluginManager.java:196) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.load(PluginManager.java:187) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:157) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:123) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.<init>(PluginManager.java:104) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.<clinit>(PluginManager.java:46) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:54) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.11-2.4.6.jar:2.4.6]
	at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.11-2.4.6.jar:2.4.6]
Does any face this issue ?
b
Yes. I had to switch to use --jars to make things work:
Copy code
spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand  --master yarn --conf "-Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" --jars ${PINOT_DISTRIBUTION_DIR}/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-${PINOT_VERSION}-shaded.jar,${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-${PINOT_VERSION}-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar -jobSpecFile spark_job_spec.yaml
m
Thanks @User. @User could try and see if that works for you? If yes then we should add this to the FAQ
s
yea. im able to run. after this, i got hit with hadoop verify error. it would be great we can specify the spark and hadoop version some where. thanks everyone.
b
I think the current spark ingestion approach should/will be superseded with an implementation based on https://github.com/apache/incubator-pinot/issues/6610 once that is released. The current approach is ... problematic. Implementation wise it's pretty interesting however - I definitely learnt a few things browsing that code over the last day or so.