I m experimenting with ingestion in a docker quickstart setu DataHub #troubleshoot

I'm experimenting with ingestion in a 'docker quic...

many-guitar-67205

03/11/2022, 12:16 PM

I'm experimenting with ingestion in a 'docker quickstart' setup. I've somehow managed to get the data in an usable state, from a UI point of view: • when refreshing the homepage, there are two toaster popups saying

an unknown error has occurred (error 500)

. • Only one dataplatform is shown • selecting a dataset that's part of my experimentation, it show a red band on top, with the message

An unknown error occurred. An unknown error occurred.

(sic) • the schema, Documentation, Properties tabs are empty how do I figure out what caused the 500's?

many-guitar-67205

03/11/2022, 1:41 PM

(using the obvious

docker logs

)

many-guitar-67205

03/11/2022, 1:42 PM

Copy code

13:41:59.773 [ForkJoinPool.commonPool-worker-7] ERROR c.l.datahub.graphql.GmsGraphQLEngine:1222 - Failed to load Entities of type: DataJob, keys: [urn:li:dataJob:(urn:li:dataFlow:(flink,prod-lz-dsh.b2cbilling.flinkcluster,prod),cdr-ingest), urn:li:dataJob:(urn:li:dataFlow:(flink,prod-lz-dsh.b2cbilling.flinkcluster,prod),cdr-processor)] Failed to batch load Data Jobs
13:41:59.774 [ForkJoinPool.commonPool-worker-7] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
	at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
	at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$133(GmsGraphQLEngine.java:1223)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	... 5 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load Data Jobs
	at com.linkedin.datahub.graphql.types.datajob.DataJobType.batchLoad(DataJobType.java:118)
	at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$133(GmsGraphQLEngine.java:1220)
	... 6 common frames omitted
Caused by: com.linkedin.data.template.TemplateOutputCastException: Invalid URN syntax: Invalid number of keys.: urn:li:dataset:(urn:li:dataPlatform:dsh,prod-lz-dsh.internal.month-aggr-cdr-data,PROD)urn:li:dataset:(urn:li:dataPlatform:dsh,prod-lz-dsh.internal.hour-aggr-cdr-voice,PROD)

many-guitar-67205

03/11/2022, 1:44 PM

figured it out... missing comma in a urn list in a

DataJobInputOutput.outputDatasets

many-guitar-67205

03/11/2022, 1:45 PM

this being said: the UI should not break over a misconfigured entity.

mammoth-bear-12532

03/12/2022, 5:33 PM

@many-guitar-67205 : thanks for reporting. Agree that the UI should be more robust to bad metadata (which should have been prevented from getting in in the first place). Can you share the fragment of the metadata event that caused this?

many-guitar-67205

03/14/2022, 4:07 PM

Here's an example: • datahub docker quickstart • datahub docker ingest-sample-data • update of example script https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/library/lineage_dataset_job_dataset.py :

Copy code

from typing import List

import datahub.emitter.mce_builder as builder
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.com.linkedin.pegasus2avro.datajob import DataJobInputOutputClass
from datahub.metadata.schema_classes import ChangeTypeClass


# Construct the DataJobInputOutput aspect.
input_datasets: List[str] = [
    'urn:li:dataset:(urn:li:dataPlatform:mysql,librarydb.member,PROD)'
    'urn:li:dataset:(urn:li:dataPlatform:mysql,librarydb.checkout,PROD)'
]

output_datasets: List[str] = [
    builder.make_dataset_urn(
        platform="kafka", name="debezium.topics.librarydb.member_checkout", env="PROD"
    )
]

input_data_jobs: List[str] = [
    builder.make_data_job_urn(
        orchestrator="airflow", flow_id="flow1", job_id="job0", cluster="PROD"
    )
]

datajob_input_output = DataJobInputOutputClass(
    inputDatasets=input_datasets,
    outputDatasets=output_datasets,
    inputDatajobs=input_data_jobs,
)

# Construct a MetadataChangeProposalWrapper object.
# NOTE: This will overwrite all of the existing lineage information associated with this job.
datajob_input_output_mcp = MetadataChangeProposalWrapper(
    entityType="dataJob",
    changeType=ChangeTypeClass.UPSERT,
    entityUrn=builder.make_data_job_urn(
        orchestrator="airflow", flow_id="flow1", job_id="job1", cluster="PROD"
    ),
    aspectName="dataJobInputOutput",
    aspect=datajob_input_output,
)

# Create an emitter to the GMS REST API.
emitter = DatahubRestEmitter("<http://localhost:8080>")

# Emit metadata!
emitter.emit_mcp(datajob_input_output_mcp)

note the missing comma in

input_datasets

When browsing the UI, got to

<http://localhost:9002/search?filter_platform=urn:li:dataPlatform:airflow>

At the top, it mentions that an error has occurred. No Airflow entries are shown (there should be 3 from the sample dataset) (In my case, the home page was not complete either, but I can't seem to reproduce that with this simple example) The code that checks if a urn is valid does not seem to care about any extra characters appended after a valid urn, so during ingestion the urns are considered to be ok.

mammoth-bear-12532

03/16/2022, 7:14 PM

Thanks for the info @many-guitar-67205, its a somewhat tricky validation scenario, we'll get back on how we want to handle this going forward. /cc @orange-night-91387 @big-carpet-38439

big-carpet-38439

03/16/2022, 11:04 PM

Yes but overall agree this needs fixed just a matter of figuring out the ‘how’ from here

3 Views

Open in Slack

Previous Next