spark submit cluster mode errors :thread:
# troubleshoot
w
spark submit cluster mode errors 🧵
has anyone been able to successfully run spark lineage in cluster mode? my app runs fine in client more but fails in cluster mode with below error
Copy code
ERROR DatahubSparkListener: java.lang.NullPointerException
	at datahub.spark.DatahubSparkListener$3.apply(DatahubSparkListener.java:258)
	at datahub.spark.DatahubSparkListener$3.apply(DatahubSparkListener.java:254)
@careful-pilot-86309 any suggestions?
For context i'm trying this on EMR 5.21
c
Hey, Can you provide me full logs ( from application start event) with debug enabled? From first look it seems like, sql execution start event is missing.
w
yes, thats the issue, the same script works fine in client mode
Copy code
INFO DatahubSparkListener: Application ended : datahub_lineage application_1659476288260_2988
22/08/17 04:24:19 ERROR DatahubSparkListener: Application end event received, but start event missing for appId application_1659476288260_2988
c
I think EMR cluster mode is sending events in different order. Please give me the steps you followed: I will try to reproduce it on my end. Meanwhile, can you create github issue to provide support for EMR cluster?
w
Steps: 1. Create sample script in EMR 2. Update spark conf under /etc/spark/conf/spark-defaults.com and add datahub properties 3. Run sample script in cluster mode
just curious on cluster mode not working for EMR, i do see instructions specific to EMR here https://datahubproject.io/docs/metadata-integration/java/spark-lineage/, so i would assume somebody successfully tested EMR , am i missing something?
@careful-pilot-86309 do you want me to create a ticket or you think we can debug this?
c
Oh. Yes. Yes. I completely missed this.
Let me check and get back
w
sure, thanks!
dathub version installed is
0.8.38
and spark.jars.packages
io.acryl:datahub-spark-lineage:0.8.43
@careful-pilot-86309 any update on this? we are stuck on this for our deployment
c
@white-hydrogen-24531 I have created PR for fix.
its merged. Please try and let me know
w
thanks, it's not released yet right, how do we use this jar in our jobs?
Nvm, i was able to build snapshot jar from master and test it. It works now. Thanks!!! @careful-pilot-86309
appreciate your help on this
c
Glad to hear that it worked for you. teamwork