Hi team, for spark data hub lineage version releas...
# advice-metadata-modeling
l
Hi team, for spark data hub lineage version released 0.8.23 and 0.8.24 we are receiving NullPointerException from DataHubSparkListener class. We are working with spark 2.4.0 and Scala 2.11.12. and python 2.7.5. Can you please help?
l
Can you please post the full stack trace in this thread?
Cc @careful-pilot-86309
l
Please refer image for stack trace. Even pyspark word count is throwing same exception to me
@careful-pilot-86309 @loud-island-88694 refer above
c
@loud-musician-49912 Can you please share the complete setup ( if using spark-submit with python scripts or jupyter notebook, sample code etc) ? Also are you using RDDs for work count? Please note that RDD operations are not yet supported. They will come in future release.
l
@careful-pilot-86309 I am using spark submit with pyspark. Spark sql is also not working for me. I shall share code soon
spark-submit --num-executors 5 --executor-cores 5 --executor-memory 4g --driver-memory 2g --jars /tmp/datahub-spark-lineage-0.8.24.jar emp.py
import time import sys import subprocess ## Getting start time of a job start_time= time.time() ## importing spark and Hive Context to run queries from pyspark.sql import SparkSession spark = (SparkSession.builder.appName('abc').config("spark.jars.packages","io.acryldatahub spark lineage0.8.24").config("spark.extraListeners","datahub.spark.DatahubSparkListener").config("spark.datahub.rest.server", "http//<hostname>8080").enableHiveSupport().getOrCreate()) query=spark.sql("select * from default.emp_test") query.write.mode("overwrite").csv("hdfs://nameservice1/tmp/outputemp/")
@careful-pilot-86309 above code we are calling
from emp.sh we are calling emp.py
c
Thanks a lot. I will check and get back in some time
I tried exactly same setup on my side and could not reproduce the issue
Is it possible to have call and check issue on your setup?
l
I will get back to on this tomorrow. Thanks
@careful-pilot-86309 we can set up meeting today at 7 pm IST if it is fine with you or else suggest any other timings comfortable with you. Can you send meeting invite at aikansh.manchanda@airtel.com
c
7pm is good for me. Will send out invite
l
I received the same. Will join
c
@loud-island-88694 we went on call and was able to resolve the issue with spark-lineage. But we have some issue with UI. Lineages are getting sent to datahub server successfully and we can get them using curl but they are not visible on UI. Datajobs can be viewed but not pipelines. Can someone from UI team take a look?
l
Thanks @careful-pilot-86309 for getting it resolved. @loud-island-88694 please align someone to get it debugged from UI end if possible
@loud-island-88694 pipelines don't show spark however under platforms in spark we see jobs are executed successfully
Pipelines should show spark??
c
@loud-musician-49912 can you try bellow to fix UI issue? https://datahubproject.io/docs/how/restore-indices/
l
@careful-pilot-86309 will try and let you know
r
We have tried restoring indices but it fails. We have tried removing the custom model by removing the directory from the plugins directory but still getting this error. How can we delete this aspect from the database.
Even after directory restoration the cli is unable to delete entity "airtel_dq:0.0.1".
Copy code
$ datahub delete --registry-id "airtel_dq:0.0.1" --hard
This will permanently delete data from DataHub. Do you want to continue? [y/N]: y
No entities found. Payload used: {"registryId": "airtel_dq:0.0.1", "dryRun": false}
Took 32.076 seconds to hard delete 0 rows for 0 entities