acoustic-printer-83045
02/25/2021, 9:45 PM{'upstreams': [
{'auditStamp': {'time': 0, 'actor': '', 'impersonator': None
}, 'dataset': 'urn:li:dataset:(urn:li:dataPlatform:redshift,events.analytics_dev_garylucas.carr_quarterly,PROD)', 'type': 'TRANSFORMED'
}
]
}
I don’t see an error from that but when I go to load lineage I get the following error in the back end (+ a UI error on the front end)
datahub-frontend | 21:36:25 [application-akka.actor.default-dispatcher-313] ERROR application - Fetch Dataset upstreams error
datahub-frontend | com.linkedin.data.template.TemplateOutputCastException: Invalid URN syntax: Urn doesn't start with 'urn:'. Urn: at index 0:
datahub-frontend | at com.linkedin.common.urn.UrnCoercer.coerceOutput(UrnCoercer.java:25)
datahub-frontend | at com.linkedin.common.urn.UrnCoercer.coerceOutput(UrnCoercer.java:11)
datahub-frontend | at com.linkedin.data.template.DataTemplateUtil.coerceOutput(DataTemplateUtil.java:954)
datahub-frontend | at com.linkedin.data.template.RecordTemplate.obtainCustomType(RecordTemplate.java:365)
datahub-frontend | at com.linkedin.common.AuditStamp.getActor(AuditStamp.java:159)
datahub-frontend | at com.linkedin.datahub.util.DatasetUtil.toLineageView(DatasetUtil.java:97)
datahub-frontend | at com.linkedin.datahub.dao.table.LineageDao.lambda$getUpstreamLineage$1(LineageDao.java:39)
datahub-frontend | at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
datahub-frontend | at java.util.Iterator.forEachRemaining(Iterator.java:116)
datahub-frontend | at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
datahub-frontend | at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
datahub-frontend | at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
datahub-frontend | at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
datahub-frontend | at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
datahub-frontend | at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
datahub-frontend | at com.linkedin.datahub.dao.table.LineageDao.getUpstreamLineage(LineageDao.java:40)
datahub-frontend | at controllers.api.v2.Dataset.getDatasetUpstreams(Dataset.java:250)
datahub-frontend | at router.Routes$$anonfun$routes$1$$anonfun$applyOrElse$28$$anonfun$apply$28.apply(Routes.scala:910)
datahub-frontend | at router.Routes$$anonfun$routes$1$$anonfun$applyOrElse$28$$anonfun$apply$28.apply(Routes.scala:910)
datahub-frontend | at play.core.routing.HandlerInvokerFactory$$anon$3.resultCall(HandlerInvoker.scala:134)
datahub-frontend | at play.core.routing.HandlerInvokerFactory$$anon$3.resultCall(HandlerInvoker.scala:133)
datahub-frontend | at play.core.routing.HandlerInvokerFactory$JavaActionInvokerFactory$$anon$8$$anon$2$$anon$1.invocation(HandlerInvoker.scala:108)
I’m pretty sure that I’ve misconfigured my upstream lineage object, however it passes validation on the way in. Any suggestions on how to troubleshoot this further?
Thanks in advance and I appreciate any insightmammoth-bear-12532
big-carpet-38439
02/25/2021, 9:59 PMactor: 'urn:li:principal:system'
big-carpet-38439
02/25/2021, 10:00 PMactor: 'urn:li:corpUser:glucas'
big-carpet-38439
02/25/2021, 10:02 PMacoustic-printer-83045
02/25/2021, 10:52 PMdatahub/metadata-ingestion
folder.acoustic-printer-83045
02/25/2021, 10:52 PMacoustic-printer-83045
02/25/2021, 11:09 PMthe “auditStamp” field must contain 2 fields: time (long) and actor (urn). To fix the issue, try creating a “system” actor URN:Ok, that makes sense. my reading of the python generated MCE classes made it look optional, but I see that the tooling generated an empty audit for me which should have been a clue.
acoustic-printer-83045
02/25/2021, 11:10 PMbig-carpet-38439
02/25/2021, 11:12 PMacoustic-printer-83045
02/25/2021, 11:13 PMbig-carpet-38439
02/25/2021, 11:41 PMbig-carpet-38439
02/25/2021, 11:41 PMacoustic-printer-83045
02/25/2021, 11:54 PMacoustic-printer-83045
02/25/2021, 11:54 PMloud-island-88694
acoustic-printer-83045
02/26/2021, 12:08 AMselect
*
From
{ref 'some_other_model' }
When it’s executed that DAG it drops a manifest file: https://docs.getdbt.com/reference/artifacts/manifest-json
And in that manifest you’ll see all upstream dependencies for each model.
I’m consuming that file, for each node in the executed SQL templates I’m pulling out enough details to construct a URN and also constructing URN’s for the dependencies.
Put that in a dictionary[urn] = []deps
And then in the sql_common
component I’m using that map to enrich the load step with dependencies. https://docs.getdbt.com/reference/artifacts/manifest-jsonacoustic-printer-83045
02/26/2021, 12:19 AM"model.invision.account__snapshot": {
"raw_sql": "A bunch of sql",
"database": "events",
"schema": "analytics_dev_garylucas",
"fqn": [
"invision",
"cft",
"main",
"transform",
"account__snapshot"
],
"unique_id": "model.invision.account__snapshot",
"package_name": "invision",
"root_path": "/dbt",
"path": "cft/main/transform/account__snapshot.sql",
"original_file_path": "models/cft/main/transform/account__snapshot.sql",
"name": "account__snapshot",
"resource_type": "model",
"alias": "account__snapshot",
"checksum": {
"name": "sha256",
"checksum": "1a622db018ec430b0af35132ebf85420ea4b8c0d154e50f35cf7f5679aeb1786"
},
"config": {
"enabled": true,
"materialized": "table",
"persist_docs": {},
"post-hook": [],
"pre-hook": [],
"vars": {},
"quoting": {},
"column_types": {},
"alias": null,
"schema": null,
"database": null,
"tags": [
"daily"
],
"full_refresh": null,
"dist": "auto"
},
"tags": [
"daily"
],
"refs": [
[
"account_summary__by_day"
],
[
"account_summary__by_day"
],
[
"account__snapshot_temp"
]
],
"sources": [],
"depends_on": {
"macros": [],
"nodes": [
"model.invision.account_summary__by_day",
"model.invision.account_summary__by_day",
"model.invision.account__snapshot_temp"
]
},
"description": "",
"columns": {
"as_of": {
"name": "as_of",
"description": "",
"meta": {},
"data_type": null,
"quote": null,
"tags": []
},
"subdomain": {
"name": "subdomain",
"description": "",
"meta": {},
"data_type": null,
"quote": null,
"tags": []
}
},
"meta": {},
"docs": {
"show": true
},
"patch_path": "models/cft/main/schema.yml",
"build_path": null,
"deferred": false,
"unrendered_config": {
"materialized": "table",
"dist": "auto",
"tags": "daily"
}
}
^ is an example of the kind of structure.
In an ideal world we’d pull out the SQL comments and determine some other fields based on the tags / comments etc.loud-island-88694
acoustic-printer-83045
02/26/2021, 12:24 AMloud-island-88694
acoustic-printer-83045
02/26/2021, 12:24 AMacoustic-printer-83045
02/26/2021, 12:43 AMacoustic-printer-83045
02/26/2021, 12:49 AMdatahub-frontend | 00:44:31 [application-akka.actor.default-dispatcher-541] ERROR application - Fetch Dataset downstreams error
datahub-frontend | com.linkedin.restli.client.RestLiResponseException: com.linkedin.restli.client.RestLiResponseException: Response status 500, serviceErrorMessage: java.lang.RuntimeException: There is no relation or more than 1 relation between the datasets!
acoustic-printer-83045
02/26/2021, 12:57 AMacoustic-printer-83045
02/26/2021, 12:57 AMbig-carpet-38439
02/26/2021, 4:55 PMbig-carpet-38439
02/26/2021, 4:55 PMacoustic-printer-83045
02/26/2021, 4:58 PMThere is no relation or more than 1 relation between the datasets!
I did a very janky load system, at this point I’d probably just load the DBT metadata graph which would list all instances.acoustic-printer-83045
02/26/2021, 4:58 PM