Hi Guys, quick question: we successfully importing...
# ingestion
c
Hi Guys, quick question: we successfully importing hive (kerberized) metadata. now we want to update the dataset inside the datahub with lineage information. how to do that? currently we extract the upstream and downstream information from hive sql history.
g
Hi Anung! Curious to hear how you're extracting the upstream/downstream information from the hive sql history. In any case, you can use metadata-ingestion's emitters to publish that lineage information to DataHub once you extract it https://datahubproject.io/docs/metadata-ingestion/#using-as-a-library
l
@gray-shoe-75895 is it correct to say they should specifically emit UpstreamLineage and DownstreamLineage events using the emitter?
g
Yes that's exactly correct - you can emit a DatasetSnapshot with an UpstreamLineage aspect, and DataHub will match the identifiers/URNs appropriately
c
Hi @gray-shoe-75895, we are using cloudera distribution, there is an audit log feature (the output is bunch of hive queries history). We use python sql sqllineage · PyPI to get the source and target tables.
g
Got it - that's pretty nifty! Using the metadata emitters to emit an update with an UpstreamLineage aspect is the way to go here - happy to give guidance as you build it
c
thank you Harshal