Hello, apologies for asking late on a friday (and ...
# troubleshoot
d
Hello, apologies for asking late on a friday (and the answer can wait) But i am getting this error on a spark job when trying to use the DatahubSparkListe
Copy code
DatahubSparkListener: java.lang.NullPointerException: Cannot invoke "java.util.Map.put(Object, Object)" because the return value of "java.util.Map.get(Object)" is null
was wondering if i could get some assistance? Stacktrace(s) in thread. Thanks for the help in advanced
Copy code
DatahubSparkListener: java.lang.NullPointerException: Cannot invoke "java.util.Map.put(Object, Object)" because the return value of "java.util.Map.get(Object)" is null
	at datahub.spark.DatahubSparkListener$SqlStartTask.run(DatahubSparkListener.java:84)
	at datahub.spark.DatahubSparkListener.processExecution(DatahubSparkListener.java:323)
	at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:237)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at <http://org.apache.spark.scheduler.AsyncEventQueue.org|org.apache.spark.scheduler.AsyncEventQueue.org>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
Copy code
ERROR DatahubSparkListener: java.lang.NullPointerException: Cannot invoke "java.util.Map.remove(Object)" because the return value of "java.util.Map.get(Object)" is null
	at datahub.spark.DatahubSparkListener$3.apply(DatahubSparkListener.java:258)
	at datahub.spark.DatahubSparkListener$3.apply(DatahubSparkListener.java:254)
	at scala.Option.foreach(Option.scala:407)
	at datahub.spark.DatahubSparkListener.processExecutionEnd(DatahubSparkListener.java:254)
	at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:241)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at <http://org.apache.spark.scheduler.AsyncEventQueue.org|org.apache.spark.scheduler.AsyncEventQueue.org>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
c
@delightful-barista-90363 Looks like spark application start event is not processed before processing actual sql jobs. Do you see pipeline created on datahub or logs showing emit event before this failure? What is your exact spark setup?
d
pipeline is not created in datahub
creating a run for logs
we have spark running on kubernetes
no logs are showing an emit event before the failure
in terms of logs i do see, i see the
DatahubSparkListener: Application Started
log
but do not see the
McpEmitter
log that follows
assuming
sqlStart.executionId()
is null, although im not too sure about spark internals
tried setting the datahub log4j settings to debug but was having some trouble with that as well
noticed that
v0.8.35
gets further in the logs but still fails
so the
sqlStart.executionId()
does get created with an earlier version