Hi. I’m doing a simple ingestion of a couple of `k...
# ingestion
w
Hi. I’m doing a simple ingestion of a couple of
kafka
topics as datasets + a
dataProcess
in-between consuming one and producing the other. While there are no errors during the ingestion, the UI fails as shown in the second screenshoot. Is that a sort of bug? Or there is something wrong in my mce json file (see thread)? Thanks!
Copy code
[
  {
    "proposedSnapshot": {
      "com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot": {
        "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,FakeTopic1,DEV)",
        "aspects": [
          {
            "com.linkedin.pegasus2avro.dataset.DatasetProperties": {
              "description": "Fake topic to test lineage"
            }
          }
        ]
      }
    }
  },
  {
    "proposedSnapshot": {
      "com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot": {
        "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,FakeTopic2,DEV)",
        "aspects": [
          {
            "com.linkedin.pegasus2avro.dataset.DatasetProperties": {
              "description": "Fake topic to test lineage"
            }
          }
        ]
      }
    }
  },
  {
    "proposedSnapshot": {
      "com.linkedin.pegasus2avro.metadata.snapshot.DataProcessSnapshot": {
        "urn": "urn:li:dataProcess:(mountpoints,DEMO,DEV)",
        "aspects": [
          {
            "com.linkedin.pegasus2avro.dataprocess.DataProcessInfo": {
              "inputs": [
                "urn:li:dataset:(urn:li:dataPlatform:kafka,FakeTopic1,DEV)"
              ],
              "outputs": [
                "urn:li:dataset:(urn:li:dataPlatform:kafka,FakeTopic2,DEV)"
              ]
            }
          }
        ]
      }
    }
  }
]
This is the response from graphql, where
/dataset/downstreamLineage/entities[0]/entity
is null for some reason
Copy code
{
  "errors": [
    {
      "message": "Exception while fetching data (/dataset/downstreamLineage/entities[0]/entity) : null",
      "locations": [
        {
          "line": 351,
          "column": 5
        }
      ],
      "path": [
        "dataset",
        "downstreamLineage",
        "entities",
        0,
        "entity"
      ],
      "extensions": {
        "classification": "DataFetchingException"
      }
    }
  ],
  "data": {
    "dataset": {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,FakeTopic1,DEV)",
      "name": "FakeTopic1",
      "type": "DATASET",
      "origin": "DEV",
      "description": "Fake topic to test lineage",
      "uri": null,
      "platform": {
        "name": "kafka",
        "info": null,
        "__typename": "DataPlatform"
      },
      "platformNativeType": null,
      "tags": [],
      "properties": [],
      "editableProperties": null,
      "ownership": null,
      "institutionalMemory": null,
      "schemaMetadata": null,
      "previousSchemaMetadata": null,
      "editableSchemaMetadata": null,
      "deprecation": null,
      "globalTags": null,
      "glossaryTerms": null,
      "__typename": "Dataset",
      "downstreamLineage": {
        "entities": [
          {
            "entity": null,
            "__typename": "EntityRelationshipLegacy"
          }
        ],
        "__typename": "DownstreamEntityRelationships"
      },
      "upstreamLineage": {
        "entities": [],
        "__typename": "UpstreamEntityRelationships"
      },
      "usageStats": {
        "buckets": [],
        "aggregations": {
          "uniqueUserCount": null,
          "totalSqlQueries": null,
          "users": null,
          "fields": null,
          "__typename": "UsageQueryResultAggregations"
        },
        "__typename": "UsageQueryResult"
      },
      "datasetProfiles": []
    }
  }
}
g
Hey Sergio- we do not recommend the use of the data process entity at the moment!
I would instead try with a DataTask instead- that is the recommended way to model data transformations
That is likely what is causing the issues for you.
w
Haven’t found any reference to
DataTask
in the project https://github.com/linkedin/datahub/search?q=datatask&type=code