I am working airlineStats example, with Pinot 0.11...
# troubleshooting
e
I am working airlineStats example, with Pinot 0.11.0 and trying to do spark 3.2 ingestion job. Default example works, but when I change inputDirURI to ADLS instead of local file system and change PinotFSSpecs scheme, I start getting error:
Copy code
Caused by: java.lang.IllegalStateException: PinotFS for scheme: abfs has not been initialized
This is spark command I am running:
Copy code
spark-submit \
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
--master local \
--deploy-mode client \
--conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins" \
--conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pinot-batch-ingestion-spark-3.2-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-adls/pinot-adls-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar" \
--conf "spark.executor.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pinot-batch-ingestion-spark-3.2-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-adls/pinot-adls-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-${PINOT_VERSION}-shaded.jar" \
local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/SparkIngestionJob.yaml
SparkIngestionJob.yaml:
Copy code
executionFrameworkSpec:
  name: 'spark'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentUriPushJobRunner'
  segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentMetadataPushJobRunner'

  extraConfigs:
    stagingDir: examples/batch/airlineStats/staging

jobType: SegmentCreationAndTarPush

inputDirURI: '<abfs://fs@accountname/...>'
includeFileNamePattern: 'glob:**/*.avro'

outputDirURI: 'examples/batch/airlineStats/segments'

overwriteOutput: true
pinotFSSpecs:
    - scheme: adl2
      className: org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
      configs:
        accountName: '..'
        accessKey: '..'
        fileSystemName: '..'

recordReaderSpec:
  dataFormat: 'avro'
  className: 'org.apache.pinot.plugin.inputformat.avro.AvroRecordReader'

tableSpec:
  tableName: 'airlineStats'
  schemaURI: '<http://20.207.206.121:9000/tables/airlineStats/schema>'
  tableConfigURI: '<http://20.207.206.121:9000/tables/airlineStats>'

segmentNameGeneratorSpec:
  type: normalizedDate
  configs:
    segment.name.prefix: 'airlineStats_batch'
    exclude.sequence.id: true

pinotClusterSpecs:
  - controllerURI: '<http://20.207.206.121:9000>'

pushJobSpec:
  pushParallelism: 2
  pushAttempts: 2
  pushRetryIntervalMillis: 1000
I am also attaching my values.yml file, which is used to deploy Pinot using helm.
I fixed it by changing abfs://fs@accountname/... to adl2://fs@accountname/ in SparkIngestionJob.yml file.
m
Yea, thanks
n
mm.. what is the difference between
abfs
and
adl2
?
e
To my understanding, there is no difference, it's just that adl2 is defined as scheme in sparkIngestionJobSpec.yaml file
👍 1
m
Adls is Azure data lake. Abs is azure blob store. Adls internally uses Abs, and we decided to support Adls for deep store for the guarantees it provides
👍 1