many-guitar-67205
03/11/2022, 12:16 PMan unknown error has occurred (error 500)
.
• Only one dataplatform is shown
• selecting a dataset that's part of my experimentation, it show a red band on top, with the message An unknown error occurred. An unknown error occurred.
(sic)
• the schema, Documentation, Properties tabs are empty
how do I figure out what caused the 500's?many-guitar-67205
03/11/2022, 1:41 PMdocker logs
)many-guitar-67205
03/11/2022, 1:42 PM13:41:59.773 [ForkJoinPool.commonPool-worker-7] ERROR c.l.datahub.graphql.GmsGraphQLEngine:1222 - Failed to load Entities of type: DataJob, keys: [urn:li:dataJob:(urn:li:dataFlow:(flink,prod-lz-dsh.b2cbilling.flinkcluster,prod),cdr-ingest), urn:li:dataJob:(urn:li:dataFlow:(flink,prod-lz-dsh.b2cbilling.flinkcluster,prod),cdr-processor)] Failed to batch load Data Jobs
13:41:59.774 [ForkJoinPool.commonPool-worker-7] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$133(GmsGraphQLEngine.java:1223)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
... 5 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load Data Jobs
at com.linkedin.datahub.graphql.types.datajob.DataJobType.batchLoad(DataJobType.java:118)
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$133(GmsGraphQLEngine.java:1220)
... 6 common frames omitted
Caused by: com.linkedin.data.template.TemplateOutputCastException: Invalid URN syntax: Invalid number of keys.: urn:li:dataset:(urn:li:dataPlatform:dsh,prod-lz-dsh.internal.month-aggr-cdr-data,PROD)urn:li:dataset:(urn:li:dataPlatform:dsh,prod-lz-dsh.internal.hour-aggr-cdr-voice,PROD)
many-guitar-67205
03/11/2022, 1:44 PMDataJobInputOutput.outputDatasets
many-guitar-67205
03/11/2022, 1:45 PMmammoth-bear-12532
many-guitar-67205
03/14/2022, 4:07 PMfrom typing import List
import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.com.linkedin.pegasus2avro.datajob import DataJobInputOutputClass
from datahub.metadata.schema_classes import ChangeTypeClass
# Construct the DataJobInputOutput aspect.
input_datasets: List[str] = [
'urn:li:dataset:(urn:li:dataPlatform:mysql,librarydb.member,PROD)'
'urn:li:dataset:(urn:li:dataPlatform:mysql,librarydb.checkout,PROD)'
]
output_datasets: List[str] = [
builder.make_dataset_urn(
platform="kafka", name="debezium.topics.librarydb.member_checkout", env="PROD"
)
]
input_data_jobs: List[str] = [
builder.make_data_job_urn(
orchestrator="airflow", flow_id="flow1", job_id="job0", cluster="PROD"
)
]
datajob_input_output = DataJobInputOutputClass(
inputDatasets=input_datasets,
outputDatasets=output_datasets,
inputDatajobs=input_data_jobs,
)
# Construct a MetadataChangeProposalWrapper object.
# NOTE: This will overwrite all of the existing lineage information associated with this job.
datajob_input_output_mcp = MetadataChangeProposalWrapper(
entityType="dataJob",
changeType=ChangeTypeClass.UPSERT,
entityUrn=builder.make_data_job_urn(
orchestrator="airflow", flow_id="flow1", job_id="job1", cluster="PROD"
),
aspectName="dataJobInputOutput",
aspect=datajob_input_output,
)
# Create an emitter to the GMS REST API.
emitter = DatahubRestEmitter("<http://localhost:8080>")
# Emit metadata!
emitter.emit_mcp(datajob_input_output_mcp)
note the missing comma in input_datasets
When browsing the UI, got to <http://localhost:9002/search?filter_platform=urn:li:dataPlatform:airflow>
At the top, it mentions that an error has occurred. No Airflow entries are shown (there should be 3 from the sample dataset)
(In my case, the home page was not complete either, but I can't seem to reproduce that with this simple example)
The code that checks if a urn is valid does not seem to care about any extra characters appended after a valid urn, so during ingestion the urns are considered to be ok.mammoth-bear-12532
big-carpet-38439
03/16/2022, 11:04 PM