Hi everybody. I'm using a custom python emitter to...
# ingestion
p
Hi everybody. I'm using a custom python emitter to add lineage to Bigquery objects that are under different projects. The UPSERT option seems to overwrite lineage upstream when switching between projects. E.g.: table `project1.Dataset1.table1``has
project1.Dataset2.table2
as an upstream. but it also has
project2.Dataset2.table2
as another upstream. When using the custom emitter (with UPSERT option) the second project seems to overwrite the first one. Is this a bug or do I need to query all project and add the lineage upstream afterwards?
Copy code
lineage_mcp = MetadataChangeProposalWrapper(
                    entityType="dataset",
                    changeType=ChangeTypeClass.UPSERT,
                    entityUrn=builder.make_dataset_urn(platform, fq_table_name, env),
                    aspectName="upstreamLineage",
                    aspect=upstream_lineage,)
l
You need to query it afaik
p
so, the way i'm handling it at the time is that i'm looping over projects. Do I need to compute the upstream for all project at a time?
l
for each entity yeah 😞
Not sure if this helps, but this is a recent PR im working on that is trying to do what you are doing here. https://github.com/linkedin/datahub/pull/4116/files#diff-a9dbd774706cbf4567f9ce3309128c176bf8dca9349d567293030a1febdf7efaR103