https://pinot.apache.org/ logo
#general
Title
# general
o

Oguzhan Mangir

04/02/2021, 1:53 PM
Can we pass hdfs path to
jobSpecFile
config for reading job spec instead of local path?
Copy code
${SPARK_HOME}/bin/spark-submit \\
  --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \\
  --master "local[2]" \\
  --deploy-mode client \\
  --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \\
  --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" \\
  local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \\
  -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/airlineStats/sparkIngestionJobSpec.yaml
like;
Copy code
-jobSpecFile <hdfs://bucket/pinot-specs/sparkIngestionJobSpec.yaml>
m

Mayank

04/02/2021, 3:46 PM
Looking at the code
LaunchDataIngestionJobCommand
seems to assume jobSpecFile is local.
Perhaps we can enhance this. Mind filing an issue?
x

Xiang Fu

04/02/2021, 5:49 PM
it requires more configs passed to pinot to init hdfs filesystem then read the config file.
I feel it’s better to wrapper a script to copy the file from hdfs to local then run it
m

Mayank

04/02/2021, 9:22 PM
Yeah, agree
o

Oguzhan Mangir

04/03/2021, 2:26 PM
agree too, thank you much
👍 1