Hello, I hope you all are well! What means the fie...
# ingestion
c
Hello, I hope you all are well! What means the field auditHeader in a mce.json file? I have to create customs jsons due to deeply nested data and I try to align my files to files that I receive if I sink for example from hive to a json file.
is there a docu somewhere that explains all fields?
g
You can check the python classes codegen, which has docstrings for each class and type here https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/metadata/schema_classes.py. If your use case supports it, I’d recommend using the python emitter interfaces to generate MCE events instead of writing them by hand
c
@gray-shoe-75895 thanks for the hint, much appreciated. What do you mean by python emitter interfaces exactly? My custom solution seems to work, but happy to follow better practice.
I'm using them to ingest, but I have to create the json file myself for now due to a deeply nested table wih struct and array type columns
g
Yep so the JSON file is one way to write MCEs, but if you’re in a pure python environment, it’s often easier to construct the MCE using the generated classes. Here’s an example of creating + emitting an MCE purely in python
Copy code
from datahub.emitter.rest_emitter import DatahubRestEmitter
import datahub.metadata as models

# Construct a user object.
user = models.MetadataChangeEventClass(
    proposedSnapshot=models.CorpUserSnapshotClass(
        urn="urn:li:corpuser:harshal",
        aspects=[
            models.CorpUserInfoClass(
                active=True,
                email="<mailto:harshal@acryl.io|harshal@acryl.io>",
                displayName="Harshal Sheth",
                title="Engineer @ <http://Acryl.io|Acryl.io>",
                firstName="Harshal",
                lastName="Sheth",
                fullName="Harshal Sheth",
            ),
            models.CorpUserEditableInfoClass(
                teams=[],
                skills=["metadata ingestion", "python"],
                pictureLink="<https://github.com/hsheth2.png>",
                aboutMe="<https://harshal.sheth.io>",
            ),
        ],
    )
)

# Create an emitter to the GMS REST API.
emitter = DatahubRestEmitter("<http://localhost:8080>")

# Emit metadata!
emitter.emit_mce(user)
Some more docs about this here: https://datahubproject.io/docs/metadata-ingestion/#using-as-a-library