Hey team As we are using Datahub Spark lineage via Databrick DataHub #ingestion

Hey team, As we are using Datahub Spark lineage vi...

silly-finland-62382

09/09/2022, 9:14 AM

Hey team, As we are using Datahub Spark lineage via Databricks to populate spark lineage, lineage is created successfully but, the following error we are facing while running this command :

Copy code

df = spark.read.format("csv").option("header", "true").load("dbfs:/FileStore/shared_uploads/nishchay.agarwal@meesho.com/services_classification.csv")
df.write.mode("overwrite").saveAsTable("new_p")

While I am running this command via Databricks Cluster, pipeline is created successfully as per name given in cluster spark conf spark.datahub.databricks.cluster shell_dbx, but 
while I am running delta table command, I am getting error :
22/09/09 09:06:56 ERROR DatasetExtractor: class org.apache.spark.sql.catalyst.plans.logical.Project is not supported yet. Please contact datahub team for further support. 

Also, I am not able to see schema of dataset that I build using spark-lineage, also both upstream & downstream table is showing same as per screenshot (that's not expected)
Also, can you help me, how to enable Delta catalog support from databricks, because its not working on Databricks

dazzling-judge-80093

09/09/2022, 9:22 AM

Thanks for reporting we will try to reproduce on our side soon

silly-finland-62382

09/12/2022, 4:21 AM

Any update on this , please can u tell me @dazzling-judge-80093

limited-cricket-18852

10/05/2022, 1:30 PM

hi @silly-finland-62382 I had the same issue but found out that it will correctly retrieve the Hive table’s name if it’s an external table, while it’ll get the LOCATION if managed. I have still no idea why

Open in Slack

Previous Next