Hi Everyone I have successfully imported my metadata from so DataHub #ingestion

Hi Everyone, I have successfully imported my metad...

hallowed-kilobyte-916

05/16/2023, 1:44 PM

Hi Everyone, I have successfully imported my metadata from some s3 paths in datahub using

from datahub.ingestion.run.pipeline import Pipeline.

Now I want to ingest the data dictionaries of the various metada ingested. I see the option to do this via the datahub interface but I can't find any documentation for doing this programmatically. Has anyone done this in the past? Is there any suggestion?

✅ 1

curved-planet-99787

05/16/2023, 1:47 PM

Hi Manrof, you simply can do:

Copy code

pipeline = Pipeline.create(recipe)
pipeline.run()

where

recipe

is just a dictionary containing the recipe configuration as described in the documentation 🙂

better-orange-49102

05/16/2023, 1:59 PM

https://datahubproject.io/docs/api/datahub-apis/

hallowed-kilobyte-916

05/16/2023, 2:20 PM

That's for the reply @curved-planet-99787. In the documentation, i see the dictionary is being uploaded for hive. Is this going to be the same for all all other sources? https://datahubproject.io/docs/api/tutorials/datasets/

better-orange-49102

05/16/2023, 2:30 PM

that example creates a hive table. In your case if the schema is already inside datahub already, you just need to programmatically add descriptions, tags and terms to the existing table like https://datahubproject.io/docs/api/tutorials/descriptions

better-orange-49102

05/16/2023, 2:31 PM

be it hive or mysql table, the code is the same, just differ in terms of the URN specified

curved-planet-99787

05/17/2023, 5:40 AM

Sorry, @hallowed-kilobyte-916 I didn't get your question, so my answer is probably not of help

hallowed-kilobyte-916

05/17/2023, 1:21 PM

@curved-planet-99787 It's fine...you led me down the path to the solution

hallowed-kilobyte-916

05/17/2023, 1:22 PM

@better-orange-49102 thank you. That worked.

Open in Slack

Previous Next