Hey, Can we run spark lineage datahub integration ...
# getting-started
s
Hey, Can we run spark lineage datahub integration on databricks?
d
I think it should work, do you see any issue with it?
s
it is not working, as I setup the spark lineage code but data is not written to datahub , but its written on databricks instead of datahub
d
what do you mean in
written on databricks instead of datahub
s
I mean how can we run spark lineage code on databricks cluster
d
s
spark listener added successfully
but the thing data is not able to write on datahub using spark lineage ?
d
I guess you have to let databricks to being able to access your datahub instance on network level
s
how we can do ? databricks team said that this issue at datahub level
d
What issue?
s
also there is no proper doc how we can leverage the datahub with databricks we set the config properties for dayhub listener and all after that spark listner is initiated properly but when run
Copy code
df = spark.read.format("csv").option("header", "true").load("dbfs:/FileStore/shared_uploads/nishchay.agarwal@meesho.com/services_classification.csv")

df.write.mode("overwrite").saveAsTable("test_tableq1") this commandit it does not changesimpact on datahub
d
Do you see any error messagess? What do you mean in
data is not able to write on datahub using spark lineage
In the logs you should see some error message
s
there is no such message
instead of it it writes data on databricks database
d
Can you see the lineage plugin initialized?
What do you mean it writes data to databricks database? Datahub lineage emitter doesn’t write anything to db
s
yeah but emiutter is not running
On Spark context startup YY/MM/DD HHmmss INFO DatahubSparkListener: DatahubSparkListener initialised. YY/MM/DD HHmmss INFO SparkContext: Registered listener datahub.spark.DatahubSparkListener On application start YY/MM/DD HHmmss INFO DatahubSparkListener: Application started: SparkListenerApplicationStart(YY/MM/DD HHmmss INFO McpEmitter: REST Emitter Configuration: GMS url <rest.server>
only this is showning biut after that emitter message not showing
spark.databricks.cluster.profile singleNode spark.datahub.databricks.cluster datahub_databricks spark.master local[*, 4] spark.datahub.rest.token eyJhbGciOiJIUzI1NiJ9.eyJhY3RvclR5cGUiOiJVU0VSIiwiYWN0b3JJZCI6Im5pc2hjaGF5LmFnYXJ3YWwiLCJ0eXBlIjoiUEVSU09OQUwiLCJ2ZXJzaW9uIjoiMiIsImV4cCI6MTY2OTExMDE3MywianRpIjoiMjRiZWM0ZGEtYWQwMC00MTNlLTk5NjEtNmE0ZjMyOTE1OTFmIiwic3ViIjoibmlzaGNoYXkuYWdhcndhbCIsImlzcyI6ImRhdGFodWItbWV0YWRhdGEtc2VydmljZSJ9.7WW2ej0OPq-9SLrWGfiKEHeuf6xiXzFUleaaomet1zc spark.sql.session.timeZone IST spark.datahub.rest.server http://172.31.18.133:8080 spark.extraListeners datahub.spark.DatahubSparkListener fs.s3a.credentialsType AssumeRole
We are using this spark conf for build spark lineage but no lineage I can see on datahub , don't know what happen
can u help on this