Hey, I'm looking for a way to change to descriptio...
# ingestion
b
Hey, I'm looking for a way to change to description of a scheme with the python emitter. Is there a object that I can use for this like
DatasetPropertiesClass
Or can I use this object to also change the scheme of a dataset?
b
https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:datahub,Dataset,PROD)/Schema?is_lineage_mode=false are the components that makes up a dataset. to change the schema (for instance, add field/remove field), you should be ingesting a newer version of schemaMetaData personally, i don't like the way the objects are shown as part of a dataset... but schema_classes.py is no longer commited to the repo.
b
Hey @better-orange-49102 Thanks for the response.I'm not understanding how I should approach editing the meta data with python based on your comment, or maybe my question was wrong. https://datahubproject.io/docs/metadata-ingestion/as-a-library/ That link gives an example to change properties on a datasets. I want to programmatic edit the description of an field. For example in the picture I want to change the description of urn. How would I approach this problem?
b
something like
Copy code
import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.metadata.schema_classes import ChangeTypeClass, EditableSchemaMetadataClass, EditableSchemaFieldInfoClass

from datahub.emitter.rest_emitter import DatahubRestEmitter

# Create an emitter to DataHub over REST
emitter = DatahubRestEmitter(gms_server="<http://localhost:8080>", extra_headers={})

# Test the connection
emitter.test_connection()

# Construct a dataset properties object
dataset_schema = EditableSchemaMetadataClass(editableSchemaFieldInfo=[
        EditableSchemaFieldInfoClass(fieldPath = 'colA', description='this is the desc for A'),
        EditableSchemaFieldInfoClass(fieldPath = 'colB', description='this is the desc for B'),
    ],
    )

# Construct a MetadataChangeProposalWrapper object.
metadata_event = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType=ChangeTypeClass.UPSERT,
    entityUrn=builder.make_dataset_urn("bigquery", "my-project.my-dataset.user-table"),
    aspectName="editableSchemaMetadata",
    aspect=dataset_schema,
)

# Emit metadata! This is a blocking call
emitter.emit(metadata_event)
however, this will overwrite any other ui edits to other fields that you didn't specify
🤔 1
b
Massive thanks! really appreciate the help!
b
Copy code
however, this will overwrite any other ui edits to other fields that you didn't specify
for instance, if there are 3 fields in the dataset, and all 3 of them already have descriptions inputted via UI, now when you emit this, the 3th field's description will disappear.
b
Good to know! Thanks in our current setup we want to document everything in yaml files that correspond a dataset. So that would be perfect then we have one single source of truth which we can track in detail