A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

image.png

Hi all, I am new to Datahub I am trying to work on a use-case to test spark lineage, with various data sources. I created a simple example where you read two files from hdfs join them and save it back to hdfs. this works on my local setup of datahub it shows spark lineage as shown in image below.

When I point this to the datahub server deployed on cloud it doesn't show any upstream and downstream dataset lineage, as shown in first image and idea where I am going wrong?

Hi, could you share the recipes for each deploy? Are the versions identical and the ingestion sources configured the same?

spark-datahub-hdfs.ipynb

<@U042KCFV9GX> thanks for replying. I haven't written any recipe to ingest the data source, I am using simple spark job attached below. Version on local setup is 0.9.3. While on cloud I have deployed using helm release version 0.10.2. I used the same script, for testing locally and on cloud, just updated the datahub rest endpoint to point to external IP of gms service.