hi, trying to add a multiple descriptions to a tab...
# troubleshoot
c
hi, trying to add a multiple descriptions to a table that already has some columns with descriptions. I’m using this Python code here https://datahubproject.io/docs/api/tutorials/descriptions#add-description-on-column. Basically, I’m now trying to run a loop where each time I’m changing the “documentation_to_add” and “column”. But it seems that the set of if.. else… overwrites the description of the first column that already has a description and after that, it gets stuck on the else clause of
Copy code
else:
    <http://log.info|log.info>("Documentation already exists and is identical, omitting write")
tried to debug, but got stuck myself. It seems that if a column with a description already exists, it can’t change any other columns. Or if it does, it deletes all other columns’ descriptions. Help? 🙂
b
It's a little hard to visualize how you write your code... Possible to share here?
c
@better-orange-49102 hi, I’m using the code in the documentation. but even without looping or anything elaborate, it just overwrites the text for the selected column and deletes everything else. the only place I played around with is the
else
condition starting at line 40 (between “you are here 3” and “4") also, when I query
get_simple_field_path_from_v2_field_path(fieldInfo.fieldPath)
it gets me only the columns that have descriptions already, instead of all the columns. It would have been easier to see all the columns, even if their description is null
b
Hmm I'm not in a position to study the code, being away from the computer, but editableschemametadata only contains fields that has descriptions that was input via UI. Fields that was never edited before doesn't appear in it.
ok i am now looking at the code, and i just want to clarify your point about "but even without looping or anything elaborate, it just overwrites the text for the selected column and deletes everything else." it will replace the existing description for the specified field with the new description, yes? were you expecting it to append to the existing desc?
c
first of all, thanks for the help! and no, I have many columns in my table, and if I run the code for one column, it deletes the descriptions from all other columns
b
ah, i see a problem with the code now: (@astonishing-answer-96712 for awareness) for a dataset X that has a editableSchemaMetadata (for say field A) but if field B is not in the aspect, then this code will logically never attempt to add description for B.
this is my code, but i probably wont put up a PR for it :P
Copy code
import logging
import time

from datahub.emitter.mce_builder import make_dataset_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper

# read-modify-write requires access to the DataHubGraph (RestEmitter is not enough)
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

# Imports for metadata model classes
from datahub.metadata.schema_classes import (
    AuditStampClass,
    EditableSchemaFieldInfoClass,
    EditableSchemaMetadataClass,
    InstitutionalMemoryClass,
)

log = logging.getLogger(__name__)
logging.basicConfig(level=<http://logging.INFO|logging.INFO>)


def get_simple_field_path_from_v2_field_path(field_path: str) -> str:
    """A helper function to extract simple . path notation from the v2 field path"""
    if not field_path.startswith("[version=2.0]"):
        # not a v2, we assume this is a simple path
        return field_path
        # this is a v2 field path
    tokens = [
        t for t in field_path.split(".") if not (t.startswith("[") or t.endswith("]"))
    ]

    return ".".join(tokens)


# Inputs -> owner, ownership_type, dataset
documentation_to_add = (
    "ui destination"
)
dataset_urn = "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
column = "shipment_info.destination"
field_info_to_set = EditableSchemaFieldInfoClass(
    fieldPath=column, description=documentation_to_add
)


# Some helpful variables to fill out objects later
now = int(time.time() * 1000)  # milliseconds since epoch
current_timestamp = AuditStampClass(time=now, actor="urn:li:corpuser:ingestion")


# First we get the current owners
gms_endpoint = "<http://localhost:8080>"
graph = DataHubGraph(config=DatahubClientConfig(server=gms_endpoint))

current_editable_schema_metadata = graph.get_aspect(
    entity_urn=dataset_urn,
    aspect_type=EditableSchemaMetadataClass,
)
# print(current_editable_schema_metadata)

need_write = False
field_match = False
if current_editable_schema_metadata:
    for fieldInfo in current_editable_schema_metadata.editableSchemaFieldInfo:
        if get_simple_field_path_from_v2_field_path(fieldInfo.fieldPath) == column:
            # we have some editable schema metadata for this field
            field_match = True
            if documentation_to_add != fieldInfo.description:
                fieldInfo.description = documentation_to_add
                need_write = True
    # this part is added to address the condition if field does not exist in editableschemametadata aspect
    if not field_match:
        curr_editableSchemaFieldInfo = current_editable_schema_metadata.editableSchemaFieldInfo
        curr_editableSchemaFieldInfo.append(field_info_to_set)
        current_editable_schema_metadata.editableSchemaFieldInfo = curr_editableSchemaFieldInfo
        need_write = True
else:
    # create a brand new editable dataset properties aspect
    current_editable_schema_metadata = EditableSchemaMetadataClass(
        editableSchemaFieldInfo=[field_info_to_set],
        created=current_timestamp,
    )
    need_write = True
print(f"need_write is {need_write}")
if need_write:
    event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
        entityUrn=dataset_urn,
        aspect=current_editable_schema_metadata,
    )
    graph.emit(event)
    <http://log.info|log.info>(f"Documentation added to dataset {dataset_urn}")

else:
    <http://log.info|log.info>("Documentation already exists and is identical, omitting write")


need_write = False
c
😄 checking!
b
but, i dont understand why it will cause your code to override existing description of other fields though, unless it created a brand new EditableSchemaMetadataClass aspect
c
it seems to work perfectly now! thank you 🙂🙂🙂
g
Thanks for this. I also got it to work today 🙂