crooked-match-16163
05/30/2023, 12:10 PMelse:
<http://log.info|log.info>("Documentation already exists and is identical, omitting write")
tried to debug, but got stuck myself. It seems that if a column with a description already exists, it can’t change any other columns. Or if it does, it deletes all other columns’ descriptions. Help? 🙂better-orange-49102
05/31/2023, 1:33 AMcrooked-match-16163
05/31/2023, 7:04 AMelse
condition starting at line 40 (between “you are here 3” and “4")
also, when I query get_simple_field_path_from_v2_field_path(fieldInfo.fieldPath)
it gets me only the columns that have descriptions already, instead of all the columns. It would have been easier to see all the columns, even if their description is nullbetter-orange-49102
05/31/2023, 7:23 AMbetter-orange-49102
05/31/2023, 11:16 AMcrooked-match-16163
05/31/2023, 12:02 PMbetter-orange-49102
05/31/2023, 1:21 PMbetter-orange-49102
05/31/2023, 1:27 PMimport logging
import time
from datahub.emitter.mce_builder import make_dataset_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
# read-modify-write requires access to the DataHubGraph (RestEmitter is not enough)
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
# Imports for metadata model classes
from datahub.metadata.schema_classes import (
AuditStampClass,
EditableSchemaFieldInfoClass,
EditableSchemaMetadataClass,
InstitutionalMemoryClass,
)
log = logging.getLogger(__name__)
logging.basicConfig(level=<http://logging.INFO|logging.INFO>)
def get_simple_field_path_from_v2_field_path(field_path: str) -> str:
"""A helper function to extract simple . path notation from the v2 field path"""
if not field_path.startswith("[version=2.0]"):
# not a v2, we assume this is a simple path
return field_path
# this is a v2 field path
tokens = [
t for t in field_path.split(".") if not (t.startswith("[") or t.endswith("]"))
]
return ".".join(tokens)
# Inputs -> owner, ownership_type, dataset
documentation_to_add = (
"ui destination"
)
dataset_urn = "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
column = "shipment_info.destination"
field_info_to_set = EditableSchemaFieldInfoClass(
fieldPath=column, description=documentation_to_add
)
# Some helpful variables to fill out objects later
now = int(time.time() * 1000) # milliseconds since epoch
current_timestamp = AuditStampClass(time=now, actor="urn:li:corpuser:ingestion")
# First we get the current owners
gms_endpoint = "<http://localhost:8080>"
graph = DataHubGraph(config=DatahubClientConfig(server=gms_endpoint))
current_editable_schema_metadata = graph.get_aspect(
entity_urn=dataset_urn,
aspect_type=EditableSchemaMetadataClass,
)
# print(current_editable_schema_metadata)
need_write = False
field_match = False
if current_editable_schema_metadata:
for fieldInfo in current_editable_schema_metadata.editableSchemaFieldInfo:
if get_simple_field_path_from_v2_field_path(fieldInfo.fieldPath) == column:
# we have some editable schema metadata for this field
field_match = True
if documentation_to_add != fieldInfo.description:
fieldInfo.description = documentation_to_add
need_write = True
# this part is added to address the condition if field does not exist in editableschemametadata aspect
if not field_match:
curr_editableSchemaFieldInfo = current_editable_schema_metadata.editableSchemaFieldInfo
curr_editableSchemaFieldInfo.append(field_info_to_set)
current_editable_schema_metadata.editableSchemaFieldInfo = curr_editableSchemaFieldInfo
need_write = True
else:
# create a brand new editable dataset properties aspect
current_editable_schema_metadata = EditableSchemaMetadataClass(
editableSchemaFieldInfo=[field_info_to_set],
created=current_timestamp,
)
need_write = True
print(f"need_write is {need_write}")
if need_write:
event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityUrn=dataset_urn,
aspect=current_editable_schema_metadata,
)
graph.emit(event)
<http://log.info|log.info>(f"Documentation added to dataset {dataset_urn}")
else:
<http://log.info|log.info>("Documentation already exists and is identical, omitting write")
need_write = False
crooked-match-16163
05/31/2023, 1:30 PMbetter-orange-49102
05/31/2023, 1:32 PMcrooked-match-16163
05/31/2023, 2:03 PMglamorous-librarian-17665
10/24/2023, 9:46 AM