Bellow are usage instruction for datahub-databrick...
# integrate-databricks-datahub
l
Thanks. What I am trying to implement is that to display the databricks pipeline in lineage like this:
The databricks orchestrator does not have correct display, nor does it display the job name.
c
Have you done integration already?
Databricks wont be poopulated as data_job. Bellow is the design: •A pipeline(dataflow) is created per – cluster_identifier: specified with spark.datahub.databricks.cluster – applicationID: on every restart of the cluster new spark applicationID will be created. •A task(datajob) is created per unique Spark query execution.
p
how is the delta lake integration coming? any progress there?
c
@modern-belgium-81337 Have you got chance to work it further after my comment? Curious to know how it went for you.
plus1 1
m
it connected for me, and I was able to pull from Databricks. There was some error in the library regarding getting table comments but @mammoth-bear-12532 helped me to find a workaround. So everything with Databricks is good now!
c
Thanks a lot for update. Glad it worked out well for you.