We recently decided to upgrade our ingestion jars ...
# troubleshooting
a
We recently decided to upgrade our ingestion jars from
v0.11.0-SNAPSHOT
to
v0.11.0
. Since we are using Java8 and have a hard dependency on spark2.4 right now, I had to compile. Now, when running the same spark ingestion job that is working on
v0.11.0-SNAPSHOT
, we are receiving the following error when using the new jars:
Copy code
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
We have been looking at this for a while now and I believe we are at the end of the line and out of ideas on where to look next.
These are the lines I had to add back in to get spark2.4 https://github.com/apache/pinot/commit/8a8bbe072fe4d2faa20fd42a10536770a382e3a5#diff-c003a08155cf1408ff891cb4ab8[…]737aa4eca86c893f6f87a1d8bL95-L102 This is the command I built with:
Copy code
mvn clean install -DskipTests -Pbin-dist  -Djdk.version=8
Resulting folder structures and jars look to be as expected
jobSpecFile
YAML is valid, as determined by successfully running the job with
v0.11.0-SNAPSHOT
Here are the
spark_args
we are passing to the ingestion command:
Copy code
PINOT_SPARK_ARGS = {
    "spark.driver.extraJavaOptions": "-Dplugins.dir=/mnt/pinot/apache-pinot-0.11.0-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-bin/conf/pinot-ingestion-job-log4j2.xml",
    "spark.executor.extraJavaOptions": "-Dplugins.dir=/mnt/pinot/apache-pinot-0.11.0-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-bin/conf/pinot-ingestion-job-log4j2.xml",
    "spark.driver.extraClassPath": "/mnt/pinot/apache-pinot-0.11.0-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pinot-batch-ingestion-spark-2.4-0.11.0-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-bin/lib/pinot-all-0.11.0-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-shaded.jar",
    "spark.executor.extraClassPath": "/mnt/pinot/apache-pinot-0.11.0-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pinot-batch-ingestion-spark-2.4-0.11.0-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-bin/lib/pinot-all-0.11.0-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-shaded.jar:/mnt/pinot/apache-pinot-0.11.0-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-shaded.jar",
}
All files exist in the appropriate paths as expected
Please let me know what other information I can provide to be helpful
My best guess is that something changed between the two versions that is affecting
org.yaml.snakeyaml.Yaml.loadAs
method's ability to find the correct class path. Although I do find it very strange that it is unable to load a class that should be in the same module?/project? (
pom.xml path
) as the calling method.
m
Adding @Kartik Khare to the thread to follow up
🙏 1
In the meanwhile, I take it that this was not helpful? https://docs.pinot.apache.org/basics/data-import/batch-ingestion/spark
a
I think that is the guide we originally followed. I'm not noticing anything helpful in the doc for this issue.
k
Hi, can you send me the complete command. Most of the time it is due to
pinot-all
jar not being loaded properly. The complete command would be helpful.
a
We have a service that handles the scheduling of the jobs, so while this isn't the spark-submit command directly, it is generated from this and has worked fine for other versions. Let me know if this helpful or not. I can see about whether or not we log out of the actual spark command somewhere.
k
Hi, Can you try replacing
'file': 'hdfs:///user/pinot/pinot-all-0.11.0-jar-with-dependencies.jar'
with
'file': '<local://pinot-all-0.11.0-jar-with-dependencies.jar>'
The jar file is already copied from hdfs to local (since it is mentioned in the --jars), so specifying local path should work. For me, this fixed the issue most of the times