Hi all ! We are using the lineage_emitter_dataset_...
# troubleshoot
t
Hi all ! We are using the lineage_emitter_dataset_finegrained.py to visualize the lineage but encountered by the error "The field at path '/dataset/upstream/relationships[0]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'LineageRelationship'"
l
@green-football-43791 ^
g
can you share your lineage emitter data @thousands-intern-95970?
t
@green-football-43791
import datahub.emitter.mce_builder as builder
import json
import os
import pandas as pd
import numpy as np
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from <http://datahub.metadata.com|datahub.metadata.com>.linkedin.pegasus2avro.dataset import (
DatasetLineageType,
FineGrainedLineage,
FineGrainedLineageDownstreamType,
FineGrainedLineageUpstreamType,
Upstream,
UpstreamLineage
)
from datahub.metadata.schema_classes import ChangeTypeClass, DataJobInputOutputClass
data = pd.read_excel('/file.xlsx')
source_t = data['SOURCE_TABLE'].tolist()
target_t = data['TARGET_TABLE'].tolist()
source_c = data['SOURCE_COLUMN'].tolist()
target_c = data['TARGET_COLUMN'].tolist()
def datasetUrn(tbl):
return builder.make_dataset_urn("postgres", tbl, "DEV")
def fldUrn(tbl, fld):
return builder.make_schema_field_urn(datasetUrn(tbl), fld);
# Lineage of fields in a dataset
# c1      <-- unknownFunc(bar2.c1, bar4.c1)
# c2      <-- myfunc(bar3.c2)
# {c3,c4} <-- unknownFunc(bar2.c2, bar2.c3, bar3.c1)
# c5      <-- unknownFunc(bar3)
# {c6,c7} <-- unknownFunc(bar4)
# note that the semantic of the "transformOperation" value is contextual.
# In above example, it is regarded as some kind of UDF; but it could also be an expression etc.
#bar1, bar2 defined as the dataset --> refernece to the source table and target table
#c1, c2 defined as the dataset --> refered as to the source and column field on the tables
for i in range(10):
fineGrainedLineages=[
FineGrainedLineage(
upstreamType=FineGrainedLineageUpstreamType.FIELD_SET,
upstreams=[fldUrn(source_t[i], source_c[i]), fldUrn(source_t[i], source_c[i])],
downstreamType=FineGrainedLineageDownstreamType.FIELD,
downstreams=[fldUrn(target_t[i], target_c[i])],
confidenceScore = 1-(i*0.1), transformOperation="myfunc")
]
#print(fineGrainedLineages)
# this is just to check if any conflicts with existing Upstream, particularly the DownstreamOf relationship
upstream = Upstream(dataset=datasetUrn("JPR_D_SCHADEN"), type=DatasetLineageType.TRANSFORMED)
FineGrainedLineage0 = FineGrainedLineage(
upstreamType=FineGrainedLineageUpstreamType.FIELD_SET,
upstreams=[fldUrn(None, None), fldUrn(None,None)],
downstreamType=FineGrainedLineageDownstreamType.FIELD,
downstreams=[fldUrn("SR_SCH_COP", "BEARB_DAT")],
confidenceScore = 1-(i*0.1), transformOperation="myfunc")
fineGrainedLineages.insert(0,FineGrainedLineage0)
fieldLineages = UpstreamLineage(upstreams=[upstream], fineGrainedLineages=fineGrainedLineages)
lineageMcp = MetadataChangeProposalWrapper(
entityType="dataset",
changeType=ChangeTypeClass.UPSERT,
entityUrn=datasetUrn("JPR_D_SCHADEN"),
aspectName="upstreamLineage",
aspect=fieldLineages
)
print(lineageMcp)
# Create an emitter to the GMS REST API.
emitter = DatahubRestEmitter(#url)
#print(upstream)
# Emit metadata!
emitter.emit_mcp(lineageMcp)
b
I can confirm this error in a fresh installation of v0.8.29, to reproduce: • just execute the example script lineage_emitter_dataset_finegrained.py provided by the DataHub Git repo (in metadata-ingestion/examples/library) • then visit one of the new objects, e.g. dataset/urnlidataset:(urnlidataPlatform:postgres,bar,PROD)/Schema?is_lineage_mode=false in the GUI • the GUI will show the reported error "The field at path '/dataset/upstream/relationships[0]/entity' was declared as a non null type [...]"
g
Hey Folks — I have a PR up to address this!