Hi friends Sorry I m a real newbie to data catalogs and espe DataHub #ingestion

Hi friends! Sorry, I'm a real newbie to data catal...

hallowed-analyst-96384

02/14/2022, 7:52 PM

Hi friends! Sorry, I'm a real newbie to data catalogs and especially Datahub, but I really need your help: we have a project that downloads and collects files from FTP/SFTP servers and then moves them to GCS after some transformations, to finally send them to HDFS. The whole process is also saved in a postgres database. I managed to ingest metadata from Postgres to our datahub in Kubernetes, but I think it's not the right architecture. Here is the last picture I need: To see in Lineage how data passed from FTP/SFTP servers to GCP and later to HDFS. The problem I have is I still don't understand how exactly lineage is created, whether it's after ingestion or during, or automatically. I have seen examples of lineage code but I still can't exactly understand how/where to implement it in our project.

orange-night-91387

02/14/2022, 8:18 PM

Generally we create lineage of entities within DataHub by hooking into process pipelines like Airflow: https://datahubproject.io/docs/lineage/airflow/ or Spark: https://datahubproject.io/docs/metadata-integration/java/spark-lineage If you're running a custom pipeline outside of one of our lineage integrations, you would need to emit MetadataChangeProposals including the lineage details from the pipeline inflection points to be able to see it in DataHub. If you use a tool other than the ones we support, consider putting up a feature request here: https://feature-requests.datahubproject.io/

hallowed-analyst-96384

02/14/2022, 8:24 PM

Cool thanks for the response. If I get you right, it means we imperatively need either Airflow or Spark for lineage right? Because I was confused by these Lineage examples that didn't use those tools.

loud-island-88694

02/14/2022, 9:00 PM

You can also use Python and Java emitters to emit lineage events in you custom services (if you are not using Airflow or Spark)

plus1 2

👍 1

loud-island-88694

02/14/2022, 9:00 PM

The examples show how you can programmatically do that

hallowed-analyst-96384

02/15/2022, 2:30 PM

Thanks for the help

Open in Slack

Previous Next