Hi team, we ran into lots of issue when setting up...
# troubleshooting
g
Hi team, we ran into lots of issue when setting up spark ingestion job with Yarn. The latest issue we saw is that the application master reported the following error after the job is submitted to the cluster and no resources can be assigned to the job:
Copy code
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.records.impl.pb.ProtoUtils.convertToProtoFormat(Lorg/apache/hadoop/yarn/api/records/ExecutionType;)Lorg/apache/hadoop/yarn/proto/YarnProtos$ExecutionTypeProto;
	at org.apache.hadoop.yarn.api.records.impl.pb.ExecutionTypeRequestPBImpl.setExecutionType(ExecutionTypeRequestPBImpl.java:73)
We wonder if pinot has also introduced this class in its dependencies and if it is conflicted with the library in our hadoop cluster itself? We are at spark 2.4.6, hadoop 2.9.1, pinot 0.9.2, and seems like pinot 0.9.2 is built with hadoop2.7.0 and spark 2.4.0, have we tested the compatible spark/hadoop version for running ingestion jobs?
for reference, the command we ran for submitting jobs:
Copy code
${SPARK_HOME}/bin/spark-submit   --class org.apache.pinot.tools.admin.PinotAdministrator  \
 --master yarn  \
 --deploy-mode cluster  \
 --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" \
 --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar:${PLUGINS_CLASSPATH}" \
 --files /mnt/data/home/grace/sparkIngestionJobSpec.yaml \
${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \
 LaunchDataIngestionJob  \
 -jobSpecFile   sparkIngestionJobSpec.yaml
m
@User any suggestions?
x
I would suggest to package all the required pinot jars together into a fat jar and use that
g
Is
pinot-all-0.9.2-jar-with-dependencies.jar
already a fat jar? Are you suggesting merging all the jars (or selected jars) in the plugins folders with pinot-all-0.9.2-jar-with-dependencies.jar and removing the
spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins
and
spark.driver.extraClassPath=${PLUGINS_CLASSPATH}
?
For future reference, we are able to resolve this by rebuilding the pinot binaries with hadoop 2.9.1 (our prod yarn cluster version) and deploying them to all nodes in cluster.
m
Thanks for confirming @User