Bellow are usage instruction for datahub databricks <https d DataHub #integrate-databricks-datahub

Bellow are usage instruction for datahub-databrick...

careful-pilot-86309

03/10/2022, 6:42 PM

Bellow are usage instruction for datahub-databricks: https://datahubspace.slack.com/files/U02HE6R3F5L/F0339SXFSJF/databricks_readme.pdf https://files.slack.com/files-pri/TUMKD5EGJ-F033NFEFR97/download/datahub-spark-lineage-databricks.jar Let me know if you are trying this. I can help with setup

datahub-spark-lineage-databricks.jar DATABRICKS_README.pdf

lemon-terabyte-66903

03/10/2022, 8:34 PM

Thanks. What I am trying to implement is that to display the databricks pipeline in lineage like this:

lemon-terabyte-66903

03/10/2022, 8:35 PM

The databricks orchestrator does not have correct display, nor does it display the job name.

careful-pilot-86309

03/12/2022, 9:05 AM

Have you done integration already?

careful-pilot-86309

03/12/2022, 9:05 AM

Databricks wont be poopulated as data_job. Bellow is the design: •A pipeline(dataflow) is created per – cluster_identifier: specified with spark.datahub.databricks.cluster – applicationID: on every restart of the cluster new spark applicationID will be created. •A task(datajob) is created per unique Spark query execution.

prehistoric-room-17640

04/21/2022, 2:58 PM

how is the delta lake integration coming? any progress there?

careful-pilot-86309

05/06/2022, 5:31 PM

@modern-belgium-81337 Have you got chance to work it further after my comment? Curious to know how it went for you.

plus1 1

modern-belgium-81337

05/06/2022, 5:36 PM

it connected for me, and I was able to pull from Databricks. There was some error in the library regarding getting table comments but @mammoth-bear-12532 helped me to find a workaround. So everything with Databricks is good now!

careful-pilot-86309

05/09/2022, 4:16 AM

Thanks a lot for update. Glad it worked out well for you.

20 Views

Open in Slack