Can we pass hdfs path to `jobSpecFile` config for ...
# general
o
Can we pass hdfs path to
jobSpecFile
config for reading job spec instead of local path?
Copy code
${SPARK_HOME}/bin/spark-submit \\
  --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \\
  --master "local[2]" \\
  --deploy-mode client \\
  --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \\
  --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" \\
  local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \\
  -jobSpecFile ${PINOT_DISTRIBUTION_DIR}/examples/batch/airlineStats/sparkIngestionJobSpec.yaml
like;
Copy code
-jobSpecFile <hdfs://bucket/pinot-specs/sparkIngestionJobSpec.yaml>
m
Looking at the code
LaunchDataIngestionJobCommand
seems to assume jobSpecFile is local.
Perhaps we can enhance this. Mind filing an issue?
x
it requires more configs passed to pinot to init hdfs filesystem then read the config file.
I feel it’s better to wrapper a script to copy the file from hdfs to local then run it
m
Yeah, agree
o
agree too, thank you much
👍 1