i encounter this issue when trying the spark ingestion ```Ca Apache Pinot #troubleshooting

i encounter this issue when trying the spark inges...

xtrntr

08/16/2021, 2:15 AM

i encounter this issue when trying the spark ingestion:

Copy code

Caused by: java.lang.NullPointerException
        at org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(SystemUtils.java:1626)
        at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
        at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:207)
        ... 27 more
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2611)
        at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner.run(SparkSegmentGenerationJobRunner.java:198)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:93)
        at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:311)
        at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
Exception in thread "main" java.lang.ExceptionInInitializerError
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:359)
        at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:189)
        at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:272)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:448)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:125)
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142)
        at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132)
        at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:67)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
        at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:370)
        at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:113)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)ullp

Mayank

08/16/2021, 2:17 AM

What version of Java are you using?

xtrntr

08/16/2021, 2:18 AM

Copy code

spark 3.0.2

pinot 0.7.1

java -version
openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment 18.9 (build 11.0.10+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.10+9, mixed mode, sharing)

xtrntr

08/16/2021, 2:24 AM

i get the jars for spark submit from here: https://downloads.apache.org/pinot/apache-pinot-incubating-0.7.1/

Copy code

${SPARK_HOME}/bin/spark-submit \
  --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
  --deploy-mode cluster \
  --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/opt/pinot/plugins -Dlog4j2.configurationFile=/opt/pinot/conf/pinot-ingestion-job-log4j2.xml" \
  --conf "spark.driver.extraClassPath=/opt/pinot/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/opt/pinot/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:/opt/pinot/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:/opt/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar" \
  --jars local:///opt/pinot/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar,local:///opt/pinot/lib/pinot-all-0.7.1-jar-with-dependencies.jar,local:///opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar,local:///opt/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar \
  local:///opt/pinot/lib/pinot-all-0.7.1-jar-with-dependencies.jar -jobSpecFile jobSpec.yaml | tee output

Mayank

08/16/2021, 2:30 AM

We are seeing some issues with newer Spark version, could you try Spark 2.3x?

Mayank

08/16/2021, 2:32 AM

xtrntr

08/16/2021, 2:39 AM

i can’t see that thread, i think it’s buried due to the 10k message limit

xtrntr

08/16/2021, 2:40 AM

let me try some workarounds

Bruce Ritchie

08/16/2021, 1:06 PM

Just upgrade apache commons to latest in your deployment.

Mayank

08/16/2021, 1:31 PM

Thanks @Bruce Ritchie

xtrntr

08/17/2021, 6:03 AM

new errors:

Copy code

libraries used:
spark 2.4.7
apache-commons-lang3 3.12.0 and apache-pinot 0.8.0 (mvn build with jdk 11)

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/pinot/tools/admin/command/LaunchDataIngestionJobCommand has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0

jdk 11 seems to be supported for spark 3 onwards only?

xtrntr

08/17/2021, 6:05 AM

Copy code

libraries used:
spark 3.0.2
apache-commons-lang3 3.12.0 and apache-pinot 0.8.0 (mvn build with jdk 11)

# Observe the following logs happening over and over

INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, 100.64.233.164, 21000, None)
INFO BlockManager: Reporting 0 blocks to the master.
INFO Executor: Told to re-register on heartbeat
INFO BlockManager: BlockManager BlockManagerId(1, 100.64.233.164, 21000, None) re-registering with master
...

xtrntr

08/17/2021, 6:11 AM

shall try building

0.8.0

with jdk 8 and running with my spark

2.4.7

image :3

xtrntr

08/17/2021, 8:25 AM

tried building from source for spark

2.4.7

, was unable to because of missing

2.4.7

for https://repo.maven.apache.org/maven2/com/holdenkarau/spark-testing-base_2.11/

xtrntr

08/17/2021, 8:27 AM

tried building from source for spark

2.4.5

, ended up with this error when running with spark-submit cluster mode on k8s:

Copy code

# build from source
mvn clean install -DskipTests -Pbin-dist -T 4 -Djdk.version=8 -Dhadoop.version=3.1.0 -Dspark.version=2.4.5

# run in a spark image with hadoop 3.1 and spark 2.4.5
${SPARK_HOME}/bin/spark-submit \
  --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
  --deploy-mode cluster \
  --conf "spark.driver.extraJavaOptions=-Dplugins.dir=..." \
  --conf "spark.driver.extraClassPath=..." \
  --jars ... \
  ... -jobSpecFile jobSpec.yaml

xtrntr

08/17/2021, 8:40 AM

im out of ideas. will probably try

standalone

batch ingestion even if its slower since data freshness is not a concern for me

xtrntr

08/17/2021, 9:43 AM

i think it would help if there was a tested docker image with the spark 2.4.0 + hadoop 2.7.0 dependencies provided for me to run my ingestion job

xtrntr

08/17/2021, 11:05 AM

@Xiang Fu was wondering which docker image i could use to run the spark batch ingestion job? i see this but its no longer available: https://github.com/apache/pinot/pull/4975/files#diff-eb034a8230fa96f6fa24ff5c173626f42eba8d39ca1a1ee2cecd9274e2d8dee8R182

Bruce Ritchie

08/17/2021, 2:09 PM

As far as JDK 11 with spark, yes, I believe spark 3+ only.

Bruce Ritchie

08/17/2021, 2:11 PM

For me the only way to ingest was via kafka.

xtrntr

08/17/2021, 9:39 PM

well thats discouraging to hear 😔

Mayank

08/17/2021, 9:44 PM

@Xiang Fu any pointers here ^^

Bruce Ritchie

08/17/2021, 10:04 PM

I'm sure it can be made to work with spark 2.x, I just am using spark 3/jdk 11 in our environment and can't backport.

Xiang Fu

08/17/2021, 10:10 PM

I haven’t tried spark 3

Xiang Fu

08/17/2021, 10:11 PM

may worth try to upgrade the spark lib and see what’s the changes if it’s huge, then we can have a new module for spark 3

Xiang Fu

08/17/2021, 10:12 PM

I don’t think client docker image works with spark model, you need to build and put the jars into spark cluster

xtrntr

08/17/2021, 10:15 PM

Think I will give it a shot at backporting spark 3 today

👍 1

Open in Slack

Previous Next