In our org we will use spark to read from kafka and write to DataHub #ingestion

In our org, we will use spark to read from kafka a...

glamorous-microphone-33484

02/03/2022, 1:12 AM

In our org, we will use spark to read from kafka and write to kafka/hive/files. Can datahub extract these lineage info out from spark streaming jobs using DatahubSparkListener?

orange-night-91387

02/03/2022, 1:49 AM

Hi! The initial release of the Spark listener is limited in terms of sources, but we do plan on expanding its functionality to support more sources. Full capabilities for the spark listener can be found here: https://datahubproject.io/docs/metadata-integration/java/spark-lineage/

glamorous-microphone-33484

02/03/2022, 3:41 AM

Hi @orange-night-91387 I was not able to extract the spark lineage while running Spark Listener from a notebook. I could not locate any log anywhere to find out about the status of the spark agent. We are using spark 2.4 and scala version 2.11. What can I do to troubleshoot further?

orange-night-91387

02/03/2022, 4:21 AM

Do you have logging setup for your Jupyter notebook? Something like: https://towardsdatascience.com/building-and-exporting-python-logs-in-jupyter-notebooks-87b6d7a86c4

glamorous-microphone-33484

02/03/2022, 4:51 AM

Mmmm... i went to look for the application master logs instead. The error was "General SSLEngine Problem". I think the problem was that we enabled https for gms service using our self-signed cert. Any idea how I should proceed from here

little-megabyte-1074

02/08/2022, 8:13 PM

Hi @glamorous-microphone-33484! Specifically to Spark Streaming - we don’t support this yet; would you mind opening a feature request? https://feature-requests.datahubproject.io/

little-megabyte-1074

02/08/2022, 8:13 PM

@careful-pilot-86309 & @elegant-doctor-86344 - can you take a look at the open questions around Spark lineage?

careful-pilot-86309

02/09/2022, 11:07 AM

@glamorous-microphone-33484 Current version of DatahubSparkListener ( 0.8.25) doesnt support https enabled gms server. Its work in progress and will get out in next release.

Open in Slack

Previous Next