Hi team, I was trying to do batch ingestion from ...
# general
a
Hi team, I was trying to do batch ingestion from s3 parquet files into pinot, but getting this error (any help/pointers)
Copy code
java.lang.RuntimeException: Failed to create IngestionJobRunner instance for class - org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:145)
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121)
        at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:130)
        at org.apache.pinot.tools.Command.call(Command.java:33)
        at org.apache.pinot.tools.Command.call(Command.java:29)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
spark-submit cmd:
Copy code
export PINOT_VERSION=0.11.0
export PINOT_DISTRIBUTION_DIR=/workspace/apache-pinot-0.11.0-bin

spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master local --deploy-mode client --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar"--jars  ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar -jobSpecFile /workspace/jupyter_notebooks/_examples/coefficient.yaml
m
@Kartik Khare ^^
t
@Ashish Kumar i think youre missing the additional spark plugin jars that need to be specified in the driver extra java options.
E.g.
Copy code
spark-submit //
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand //
--master local --deploy-mode client //
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins" //
--conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pinot-batch-ingestion-spark-2.4-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" //
-conf "spark.executor.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pinot-batch-ingestion-spark-2.4-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" //
local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar -jobSpecFile /path/to/spark_job_spec.yaml
a
@Tim Santos thanks! I couldn't find the pinot-batch-ingestion-spark-2.4-${PINOT_VERSION} in latest pinot release (i.e. PINOT_VERSION=0.11.0)
while pinot-batch-ingestion-spark-3.2-0.11.0 is available
I tried using pinot-batch-ingestion-spark-3.2-0.11.0-* but still same error