hi I am new to Datahub and plan on using it But I had a few DataHub #ingestion

hi, I am new to Datahub and plan on using it. But ...

red-journalist-15118

04/19/2021, 4:52 PM

hi, I am new to Datahub and plan on using it. But I had a few questions: 1. How is the metadata brought into Datahub. I see there are ingestion scripts. But is there way for each of the data sources to push the metadata to spark topics (push-based architecture) instead of periodically calling the ingestions scripts? 2. Are owners added manually or should there be an "owners" field in the json metadata? 3. How are table descriptions and column descriptions added? Are they manually created through the UI? Or should there be a "description" field in the json metadata for both the tables and the columns?

big-carpet-38439

04/19/2021, 4:59 PM

Hi Kalyan! Welcome to the community. 1. We do support "push" of metadata for particular systems (eg. Airflow), but most folks prefer to manage periodic pull+push jobs to batch ingest metadata. 2. I do not believe owners are auto-populated using the Ingestion framework @gray-shoe-75895 to confirm. This means you'd either have to write a transformer to enrich the metadata to include owner or add via UI 3. Table descriptions and columns descriptions can technically be written using either the ingestion framework or the UI. Typically, they are provided via the UI unless the data platform from which we are ingesting is capable of providing that rich metadata

✅ 1

mammoth-bear-12532

04/19/2021, 5:37 PM

@red-journalist-15118: DataHub excels at push-based integration. If you can push over HTTP or Kafka, you can send metadata from anywhere to DataHub.

mammoth-bear-12532

04/19/2021, 5:37 PM

We have convenience methods in Python listed here: https://datahubproject.io/docs/metadata-ingestion/#using-as-a-library

mammoth-bear-12532

04/19/2021, 5:39 PM

but you can always emit metadata from your favorite language to DataHub, as long as you can write Avro to Kafka.

mammoth-bear-12532

04/19/2021, 5:45 PM

@red-journalist-15118: for your question on ownership and schema / field description metadata: you can send it via the push-API as well as edit it via UI .. (we store them separately to keep the metadata storage separate)

big-carpet-38439

04/19/2021, 7:02 PM

Correct. You can send it, but our default ingestion adapters do not populate for you

red-journalist-15118

04/19/2021, 8:06 PM

@big-carpet-38439 if the default ingestion does not populate it, how can I populate once I push/pull it from the source (e.i. Hive)?

big-carpet-38439

04/19/2021, 8:16 PM

@red-journalist-15118 Today, you have a few options: 1. Write a Transformer that is capable of resolving Ownership given a Dataset record 2. Write a custom Python flow to find the ownership for your datasets and push it to DataHub using the Emitter APIs 3. Add Ownership information inside of DataHub UI manually

big-carpet-38439

04/19/2021, 8:17 PM

Where is your ownership information located?

red-journalist-15118

04/19/2021, 8:32 PM

@big-carpet-38439 we store owner info and table description and column description as a field in the json metadata

big-carpet-38439

04/19/2021, 8:32 PM

Is it a hive table property?

red-journalist-15118

04/19/2021, 8:51 PM

yeah

mammoth-bear-12532

04/19/2021, 9:12 PM

@red-journalist-15118: is this a custom format that you have at your company, or a standard format that many deployments use?

mammoth-bear-12532

04/19/2021, 9:12 PM

trying to figure out if we just need to provide pluggability here.. or an out-of-the-box solution for this

big-carpet-38439

04/19/2021, 9:39 PM

+1^

red-journalist-15118

04/19/2021, 10:20 PM

I am actually not sure about this. I can ask my team this week and follow up! I really appreciate all your guys help! You guys are amazing!

🎉 1

big-carpet-38439

04/19/2021, 10:39 PM

Have no fear! We'll figure something out!

big-carpet-38439

04/19/2021, 10:39 PM

Once you have that talk we can schedule some time to try to figure out something that could work for you but also be made general 🙂

chilly-spring-43918

04/20/2021, 3:29 AM

Hi @big-carpet-38439 i'm interested at your point number 3 above, as i'm having the same case as Kaylan. My question is how (technically) to ingest field description from the ingestion framework? is it possible to write it in the

recipes

file so we can just use

datahub ingesct -c

command?

mammoth-bear-12532

04/20/2021, 4:24 AM

@chilly-spring-43918: field descriptions are automatically ingested AFAIK. Which source are you connecting it to?

chilly-spring-43918

04/20/2021, 4:25 AM

i'm trying to ingesting from Hive source, let me give you the screenshot

chilly-spring-43918

04/20/2021, 4:26 AM

Here is my structure table on Hive, but when i tried to export it to file, the description is null.

mammoth-bear-12532

04/20/2021, 4:53 AM

Thanks @chilly-spring-43918, maybe the sqlalchemy driver (pyhive) that we're using does not pull these fields. We'll take a look.

mammoth-bear-12532

04/20/2021, 4:53 AM

@gray-shoe-75895: ^^

chilly-spring-43918

04/20/2021, 4:53 AM

thank you @mammoth-bear-12532

chilly-spring-43918

04/26/2021, 3:15 AM

Hi @gray-shoe-75895 @mammoth-bear-12532, apologize to follow this up, do you guys had a chance to take a look on this matter?

mammoth-bear-12532

04/26/2021, 3:40 PM

@chilly-spring-43918: we are looking into the

pyhive

implementation. Will get back to you in a day or two.

chilly-spring-43918

04/27/2021, 12:31 AM

@mammoth-bear-12532 Thank you very much

prehistoric-doctor-36763

04/30/2021, 11:25 AM

have the same issue with ingesting descriptions from hive. Looking forward for possible solution. Thank you guys

mammoth-bear-12532

04/30/2021, 2:51 PM

@prehistoric-doctor-36763 thanks for letting us know. Watch this thread, will get back soon.

gray-shoe-75895

05/04/2021, 5:57 AM

Hi @chilly-spring-43918 and @prehistoric-doctor-36763, this should be fixed now in acryl-datahub version 0.3.0 - let me know if you run into any issues with it!

🙌 2

mammoth-bear-12532

05/04/2021, 6:03 AM

@red-journalist-15118: FYI

Open in Slack

Previous Next