Hi All, Just wondering whether I can use a mce jso...
# ingestion
c
Hi All, Just wondering whether I can use a mce json file to ingest dashboard/chart metadata using file-to-rest method. mce.json ->
Copy code
{
  "auditHeader": null,
  "proposedSnapshot": {
    "com.linkedin.pegasus2avro.metadata.snapshot.DashboardSnapshot": {
      "urn": "urn:li:dashboard:sample",
      "aspects": [
        {
          "com.linkedin.pegasus2avro.dataset.DashboardInfo": {
            "title": "Sample Dashboard",
            "description": "This is a sample dashboard to test mce events"
          }
        },
        {
          "com.linkedin.pegasus2avro.common.Ownership": {
            "owners": [
              {
                "owner": "urn:li:corpuser:bi-analyst",
                "type": "DEVELOPER"
              }
            ]
          }
        }
      ]
    }
  },
  "proposedDelta": null
}
metadata.yml
Copy code
source:
  type: "file"
  config:
    filename: ./mce.json

sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
I get an error saying
__root__, MetadataFileSourceConfig expected dict not str (type=type_error)
. So I am wondering either this method doesn't support Dashboard/Chart metadata or I am missing a property or a value in my mce json šŸ™‚
g
It seems to be failing to validate the yml recipe config
In the logs, it should reprint the config that it parsed, prior to validation - can you paste that here?
c
Hey @gray-shoe-75895, this is the output that I get
Copy code
[2021-04-21 11:06:59,968] INFO     {datahub.entrypoints:68} - Using config: {'source': {'type': 'file', 'config': 'filename:/home/ec2-user/metadata/dashboards_mce.json'}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://localhost:8080>'}}}
1 validation error for MetadataFileSourceConfig
__root__
  MetadataFileSourceConfig expected dict not str (type=type_error)
g
Yep it's definitely parsing the YAML incorrectly - can you try adding a space after filename?
c
Copy code
Traceback (most recent call last):
  File "/usr/local/bin/datahub", line 8, in <module>
    sys.exit(datahub())
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/datahub/entrypoints.py", line 74, in ingest
    pipeline.run()
  File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
    for wu in self.source.get_workunits():
  File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/source/mce_file.py", line 37, in get_workunits
    for i, mce in enumerate(iterate_mce_file(self.config.filename)):
  File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/source/mce_file.py", line 18, in iterate_mce_file
    mce: MetadataChangeEvent = MetadataChangeEvent.from_obj(obj)
  File "/usr/local/lib/python3.6/site-packages/avrogen/dict_wrapper.py", line 38, in from_obj
    return conv.from_json_object(obj, cls.RECORD_SCHEMA)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 98, in from_json_object
    return self._generic_from_json(json_obj, writers_schema, readers_schema)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 245, in _generic_from_json
    result = self._record_from_json(json_obj, writers_schema, readers_schema)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 328, in _record_from_json
    field_value = self._generic_from_json(json_obj[field.name], writers_field.type, field.type)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 243, in _generic_from_json
    result = self._union_from_json(json_obj, writers_schema, readers_schema)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 294, in _union_from_json
    return self._generic_from_json(value, s, readers_schema)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 226, in _generic_from_json
    return self._generic_from_json(json_obj, writers_schema, s)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 245, in _generic_from_json
    result = self._record_from_json(json_obj, writers_schema, readers_schema)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 328, in _record_from_json
    field_value = self._generic_from_json(json_obj[field.name], writers_field.type, field.type)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 239, in _generic_from_json
    result = self._array_from_json(json_obj, writers_schema, readers_schema)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 268, in _array_from_json
    for x in json_obj]
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 268, in <listcomp>
    for x in json_obj]
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 243, in _generic_from_json
    result = self._union_from_json(json_obj, writers_schema, readers_schema)
  File "/usr/local/lib/python3.6/site-packages/avrogen/avrojson.py", line 299, in _union_from_json
    raise schema.AvroException('Datum union type not in schema: %s', value_type)
avro.schema.AvroException: ('Datum union type not in schema: %s', 'com.linkedin.pegasus2avro.dataset.ChartInfo')
Thanks @gray-shoe-75895. It was the issue with space after file name 🤦 Anyway I am trying to ingest a mce file with few sample values for a dashboard and chart. But I run into a AvroException, which I assume might be related to not having required values or properties in my mce file for Dashboards/Charts. Is there a code or a document that I can refer to figure out the mce format or the properties that should be included for entities like charts and dashboards. My mce file is as below,
Copy code
[
    {
        "auditHeader": null,
        "proposedSnapshot": {
            "com.linkedin.pegasus2avro.metadata.snapshot.ChartSnapshot": {
                "urn": "urn:li:chart:sample_chart",
                "aspects": [
                    {
                        "com.linkedin.pegasus2avro.dataset.ChartInfo": {
                            "title": "Chart One - Sample Dashboard",
                            "description": "This is a test chart",
                            "chartUrl": "<http://google.com|google.com>"
                        }
                    },
                    {
                        "com.linkedin.pegasus2avro.common.Ownership": {
                            "owners": [
                                {
                                    "owner":"urn:li:corpuser:bi-analyst",
                                    "type": "DEVELOPER"
                                }
                            ]
                        }
                    }
                ]
            }
        },
        "proposedDelta": null
    },
    {
        "auditHeader": null,
        "proposedSnapshot": {
            "com.linkedin.pegasus2avro.metadata.snapshot.DashboardSnapshot": {
                "urn": "urn:li:dashboard:sample",
                "aspects": [
                    {
                        "com.linkedin.pegasus2avro.dataset.DashboardInfo": {
                            "title": "Sample Dashboard",
                            "description": "This is a sample dashboard to test mce events",
                            "dashboardUrl": "<http://google.com|google.com>",
                            "charts": [
                                "urn:li:chart:sample_chart"
                            ]
                        }
                    },
                    {
                        "com.linkedin.pegasus2avro.common.Ownership": {
                            "owners": [
                                {
                                    "owner":"urn:li:corpuser:bi-analyst",
                                    "type": "DEVELOPER"
                                }
                            ]
                        }
                    }
                ]
            }
        },
        "proposedDelta": null
    }
]
Error -
g
@calm-addition-66352 did you write this JSON file by hand? it should be
com.linkedin.pegasus2avro.chart.ChartInfo
and
com.linkedin.pegasus2avro.dashboard.DashboardInfo
instead of
com.linkedin.pegasus2avro.dataset.ChartInfo
and
com.linkedin.pegasus2avro.dataset.DashboardInfo
- the namespaces are chart/dashboard instead of dataset
the error messages aren't particularly good
c
Yes, I wrote by my hand. I am trying to integrate with Quicksight, so prior to that I am trying to generate few sample mce files šŸ™‚
Thanks for the quick response @gray-shoe-75895. Let me try the changes and see how it goes
g
got it - totally makes sense! for future reference, you can also validate handwritten MCE files using
datahub check mce-file <filename>
prior to running ingestion
c
Thanks @gray-shoe-75895. Let me try
I used the attached mce file to ingest a simple chart and a dashboard. It gets ingested successfully, but I guess I am missing a certain property or I am using the
dataplatform
property incorrectly in the urn. Because on the UI, i get an error when I try to go to this individual dashboard or chart page. • By
data platform
do we mean a data source or is it like the reporting platform on this scenario eg: superset, quicksight, looker ? • By any chance do you have a valid dashboard or chart mce file available that I can refer to ?
g
Data platform should be the reporting platform, like superset or looker. There's a good sample chart and dashboard in the bootstrap mce file here https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/mce_files/bootstrap_mce.json#L1297-L1420
I think some of the formatting in your MCE is just slightly off
c
Thanks @gray-shoe-75895, this bootstrap file is really helpful šŸ‘