Hi, I am trying to run spark 3.2 ingestion job wit...
# troubleshooting
e
Hi, I am trying to run spark 3.2 ingestion job with Pinot 0.11.0, but I keep getting this error:
Copy code
Caused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
Here is my spark-submit command:
Copy code
spark-submit \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master local \
--deploy-mode client \
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins" \
--conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pinot-batch-ingestion-spark-3.2-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-adls/pinot-adls-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar" \
--conf "spark.executor.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pinot-batch-ingestion-spark-3.2-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-adls/pinot-adls-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar" \
local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/spark_job_spec.yaml
Here is my spark_job_spec.yaml file:
Copy code
executionFrameworkSpec:
  name: 'spark'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner'
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentMetadataPushJobRunner'
  extraConfigs:
    stagingDir: /path/to/staging
jobType: SegmentCreationAndTarPush
inputDirURI: '/path/to/input'
outputDirURI: '/path/to/output'
overwriteOutput: true
pinotFSSpecs:
    - scheme: adl2
      className: org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
      configs:
        accountName: 'account-name'
        accessKey: 'sharedAccessKey'
        fileSystemName: 'fs-name'
recordReaderSpec:
    dataFormat: 'parquet'
    className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader'
tableSpec:
    tableName: 'spire'
pinotClusterSpecs:
    - controllerURI: '<http://50.107.051.240:9000>'
1
n
@Edgaras Kryževičius Spark 3.2 segment generator code is under a different package namespace -
org.apache.pinot.plugin.ingestion.batch.spark3.
So, update the configs in your
spark_job_spec.yaml
file to use the job runners in this namespace. Your spark job is looking for spark 2.4 segment generators.
1