I d like to use datahub to track downstream consumers of dat DataHub #getting-started

I'd like to use datahub to track downstream consum...

adamant-postman-92176

05/09/2023, 12:24 AM

I'd like to use datahub to track downstream consumers of data, as well as upstream producers. Say for example I have an airflow job that writes to an s3 bucket. Later, a cron job reads from that s3 bucket and takes some action (e.g. emails a customer, etc.) What's the best way to represent this cron job as a consumer of data? Should it be tracked as a "dataset", even though it doesn't really store data anywhere? Or, is it better to track it using

metadata enrichment▾

to write a set of tags to the data source for the s3 bucket saying how the data is used? Thanks for any help. I'm sure this is a common problem, but I think I lack the proper nouns to properly search for this; I haven't had much luck so far. -Eli

🔍 1

📖 1

lively-cat-88289

05/09/2023, 12:25 AM

Hey there 👋 I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: ✅ There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? ✅ button ✅ It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? ✅ button Did you find a solution to your issue? ❌ Sorry you weren't able to find a solution. I'm sending you some tips on info you can provide to help the community troubleshoot. Whenever you feel your issue is solved, please react ✅ to your original message to let us know!

big-carpet-38439

05/09/2023, 7:32 PM

@adamant-postman-92176 I would strongly recommend modeling this as a "DataFlow" with a single child "DataJob" for your CRON job!

big-carpet-38439

05/09/2023, 7:32 PM

You can then track individual runs of the cron job using the "DataProcessInstance" entity!

adamant-postman-92176

05/10/2023, 5:17 AM

Thanks John, I will look into this. Do you know if there is any documentation on these topics? I found an API reference, but if there is something that covers the concepts or a sample to work from, that would be very helpful. I appreciate your help!

big-carpet-38439

05/11/2023, 3:28 PM

So typically this understanding lives inside of the connectors - but this overview should help a bit https://datahubproject.io/docs/metadata-modeling/metadata-model/

2 Views

Open in Slack

Previous Next