Hello! Do you have any working tutorial for Spark ...
# troubleshooting
p
Hello! Do you have any working tutorial for Spark batch loading for the latest version of Pinot? After migration of jars to plugins-external. Cannot make it working at all
k
Hi can you describe what error you are getting? also what spark version hadoop version you are using
Also by latest you mean master build or 0.10.0?
p
Hi, I’m trying to follow official pinot docs https://docs.pinot.apache.org/users/tutorials/batch-data-ingestion-in-practice#executing-the-job-using-spark (different from https://docs.pinot.apache.org/basics/data-import/batch-ingestion/spark) which is pain as well 🙂 Spark 2.4.8 with hadoop 2.7 (official build), running in local mode Command: export SPARK_HOME=/spark export PINOT_ROOT_DIR=/pinot export PINOT_VERSION=0.10.0 export PINOT_DISTRIBUTION_DIR=$PINOT_ROOT_DIR cd ${PINOT_DISTRIBUTION_DIR} ${SPARK_HOME}/bin/spark-submit \ --verbose \ --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \ --master “local[2]” \ --deploy-mode client \ --conf “spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins-external” \ --conf “spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar” \ local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \ -jobSpecFile /app/job-spec-spark.yaml I copied pinot-batch-ingestion-spark-0.10.0-shaded.jar to spark/jars and getting error: Exception in thread “main” java.lang.VerifyError: Bad type on operand stack Exception Details: Location: org/apache/spark/metrics/sink/MetricsServlet.<init>(Ljava/util/Properties;Lcom/codahale/metrics/MetricRegistry;Lorg/apache/spark/SecurityManager;)V @116: invokevirtual Reason: Type ‘com/codahale/metrics/json/MetricsModule’ (current frame, stack[2]) is not assignable to ‘shaded/com/fasterxml/jackson/databind/Module’ Current Frame: bci: @116 flags: { } locals: { ‘org/apache/spark/metrics/sink/MetricsServlet’, ‘java/util/Properties’, ‘com/codahale/metrics/MetricRegistry’, ‘org/apache/spark/SecurityManager’ } stack: { ‘org/apache/spark/metrics/sink/MetricsServlet’, ‘shaded/com/fasterxml/jackson/databind/ObjectMapper’, ‘com/codahale/metrics/json/MetricsModule’ } Bytecode: 0000000: 2a2b b500 2a2a 2cb5 002f 2a2d b500 5c2a 0000010: b700 7e2a 1280 b500 322a 1282 b500 342a 0000020: 03b5 0037 2a2b 2ab6 0084 b600 8ab5 0039 0000030: 2ab2 008f 2b2a b600 91b6 008a b600 95bb 0000040: 0014 592a b700 96b6 009c bb00 1659 2ab7 0000050: 009d b600 a1b8 00a7 b500 3b2a bb00 7159 0000060: b700 a8bb 00aa 59b2 00b0 b200 b32a b600 0000070: b5b7 00b8 b600 bcb5 003e b1
If I don’t copy pinot-batch-ingestion-spark-0.10.0-shaded.jar to spark/jars, class not found…
k
can you also tell the java version you are using?
p
And pinot is 0.10.0 official build. OpenJDK 11
k
got it
actually we recently made spark dependency as provided in our master branch Is it possible for you to use that spark-shaded-0.11-Snapshot
I’ll move this conversation to DM we can sort it out there
p
This issue is related to 0.10.0 version, the latest versions is working fine with following command: export SPARK_HOME=/spark export PINOT_ROOT_DIR=/pinot export PINOT_VERSION=0.11.0-SNAPSHOT export PINOT_DISTRIBUTION_DIR=$PINOT_ROOT_DIR cd ${PINOT_DISTRIBUTION_DIR} ${SPARK_HOME}/bin/spark-submit \ --verbose \ --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \ --master “local[2]” \ --deploy-mode client \ --conf “spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins-external” \ --conf “spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:/pinot/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar” \ local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \ -jobSpecFile /app/job-spec-spark.yaml Note: this command is different from https://docs.pinot.apache.org/basics/data-import/batch-ingestion/spark ! Should be fixed
k
Thanks for pointing out! Updated the documentation.
👍 1