Hi all! Just wanted to say hi :wave:, introduce my...
# getting-started
d
Hi all! Just wanted to say hi 👋, introduce myself and ask a few questions! Matthias from Arabesque AI here, we are looking into metadata stores and landed on Datahub and it's been very enjoyable so far 🎉. A few initial questions (please point me to documentation where possible, my initial searches might have been failing / too superficial to find it myself, but I am happy to go ahead and dig deeper myself before burdening more people 🙂) : • Is there a way to ingest metadata from Google Cloud Storage or ElasticSearch? • I read somewhere about the possibility to add ML models as well - is this documented somewhere on how to do so? • Lastly: data lineage - I've found docs on how to do so using Airflow, but is there a way to add this manually (for now) and how? We set it up internally on GCP, so I'd be happy to look into contributing docs/steps if that'd be useful!
🙌 1
m
Welcome @delightful-policeman-14573! • GCS and ElasticSearch : integrations have to be built. Would love contributions here! (https://datahubproject.io/docs/metadata-ingestion) • ML Models: the metadata models already are in (contributed by Expedia: @nutritious-bird-77396 and @orange-night-91387) .. check them out and let us know if they represent the information you want to capture. We are working on AWS Sagemaker integration currently. Let me know if you use some Google cloud specific framework. • Data Lineage: you can emit metadata on your own as well! Check out the emitter here. Check with @gray-shoe-75895 for any questions.
👍 1
💯 1
d
Thanks so much @mammoth-bear-12532 - very kind. I hope we get bandwidth to contribute! I'll be looking into all of the references you sent over!
b
Welcome!! Let us know if you want to hop on a call to discuss anything
🎉 1
d
Thanks John! I will collect my thoughts today and might reach out in the coming days / week!
b
@delightful-policeman-14573 I stumbled across your question and would like to know how far you’ve got with integrating GCS. Did you work/think on that already?
d
I haven't kicked things off unfortunately since time was allocated be then something else got higher priority (you know how things go 🧌). I hope to get back to this in 2 weeks though, will ping you if I do!
b
yeah I know these situations where priorities get shifted 🙂 In the meantime I started working on an enhanced source for GCS. Maybe we can then have a chat about it 👍
🙌 1