most-solstice-19338
01/17/2022, 9:14 AMsquare-activity-64562
01/17/2022, 9:19 AMsquare-activity-64562
01/17/2022, 9:21 AMsquare-activity-64562
01/17/2022, 9:22 AMsquare-activity-64562
01/17/2022, 9:22 AMsquare-activity-64562
01/17/2022, 9:25 AMmost-solstice-19338
01/17/2022, 9:34 AMsquare-activity-64562
01/17/2022, 9:35 AMmost-solstice-19338
01/17/2022, 9:36 AMmost-solstice-19338
01/17/2022, 9:36 AMsquare-activity-64562
01/17/2022, 9:37 AMmost-solstice-19338
01/17/2022, 9:38 AMmost-solstice-19338
01/17/2022, 9:39 AMchart_info_mcp = MetadataChangeProposalWrapper(
entityType="dataJob",
changeType=ChangeTypeClass.UPSERT,
entityUrn=builder.make_data_job_urn(
orchestrator="ADF", flow_id="flow2", job_id="job1", cluster="PROD"
),
aspectName="dataJobInfo",
aspect=datajob_info,
)
most-solstice-19338
01/17/2022, 9:39 AMsquare-activity-64562
01/17/2022, 9:39 AMsquare-activity-64562
01/17/2022, 9:39 AMurn:li:dataPlatform:ADL
Notice the L
at the endmost-solstice-19338
01/17/2022, 9:39 AMmost-solstice-19338
01/17/2022, 9:39 AMsquare-activity-64562
01/17/2022, 9:40 AMADL
not ADF
which you are sending in as the orchasterator.square-activity-64562
01/17/2022, 9:40 AMmost-solstice-19338
01/17/2022, 9:42 AMmost-solstice-19338
01/17/2022, 9:42 AMmost-solstice-19338
01/17/2022, 9:42 AMsquare-activity-64562
01/17/2022, 9:44 AMdataJob
but the example at https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/library/lineage_dataset_job_dataset.py#L36 has datajob
. Please check your spellings.
https://datahubspace.slack.com/archives/C02R2NBJXD1/p1642412357011400?thread_ts=1642410884.008900&cid=C02R2NBJXD1square-activity-64562
01/17/2022, 9:45 AMmost-solstice-19338
01/17/2022, 9:50 AMsquare-activity-64562
01/17/2022, 9:56 AMdataJob
. Can you please try with datajob
? Seems there is some contradictory examples. I will cross-check the examplesmost-solstice-19338
01/17/2022, 9:58 AMmost-solstice-19338
01/17/2022, 10:01 AMmost-solstice-19338
01/17/2022, 10:02 AMsquare-activity-64562
01/17/2022, 10:02 AMmost-solstice-19338
01/17/2022, 10:16 AMsquare-activity-64562
01/17/2022, 10:17 AMmost-solstice-19338
01/17/2022, 10:17 AMsquare-activity-64562
01/17/2022, 10:18 AMmost-solstice-19338
01/17/2022, 10:19 AM10:18:27.897 [qtp544724190-14] INFO c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=dataJobInfo, entityUrn=urn:li:dataJob:(urn:li:dataFlow:(urn:li:dataPlatform:ADF,flow14,PROD),job14), entityType=datajob, aspect={contentType=application/json, value=ByteString(length=153,bytes=7b226375...6429227d)}, changeType=UPSERT}
10:18:27.915 [pool-8-thread-1] INFO c.l.m.filter.RestliLoggingFilter:56 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 18ms
most-solstice-19338
01/17/2022, 10:22 AMsquare-activity-64562
01/17/2022, 10:22 AMsquare-activity-64562
01/17/2022, 10:23 AMsquare-activity-64562
01/17/2022, 10:23 AMmost-solstice-19338
01/17/2022, 10:25 AMsquare-activity-64562
01/17/2022, 10:27 AMroot
as user and datahub
as pass?most-solstice-19338
01/17/2022, 10:35 AMsquare-activity-64562
01/17/2022, 10:36 AMmost-solstice-19338
01/17/2022, 10:37 AMsquare-activity-64562
01/17/2022, 10:38 AMsquare-activity-64562
01/17/2022, 10:39 AMmost-solstice-19338
01/17/2022, 10:39 AMsquare-activity-64562
01/17/2022, 10:41 AMsquare-activity-64562
01/17/2022, 10:41 AMmost-solstice-19338
01/17/2022, 10:44 AMhttps://orangeman.dk/wp-content/uploads/2019/06/DataFactory.jpg▾
most-solstice-19338
01/17/2022, 10:44 AMmost-solstice-19338
01/17/2022, 10:44 AMmost-solstice-19338
01/17/2022, 10:45 AMorange-night-91387
01/20/2022, 9:07 PMorange-night-91387
01/20/2022, 9:45 PMcurl <http://localhost:8080/entities/urn%3Ali%3AdataPlatform%3AADF>
Do you get anything back? (pointed at wherever your GMS is located, for this example assuming it's localhost)most-solstice-19338
01/21/2022, 8:08 AM{
"value": {
"com.linkedin.metadata.snapshot.DataPlatformSnapshot": {
"urn": "urn:li:dataPlatform:ADF",
"aspects": [
{
"com.linkedin.metadata.key.DataPlatformKey": {
"platformName": "ADF"
}
},
{
"com.linkedin.dataplatform.DataPlatformInfo": {
"name": "Azure Data Factory",
"datasetNameDelimiter": "/",
"type": "OTHERS",
"displayName": "ADF",
"logoUrl": "<https://orangeman.dk/wp-content/uploads/2019/06/DataFactory.jpg>"
}
}
]
}
}
}
orange-night-91387
01/21/2022, 3:27 PMorange-night-91387
01/21/2022, 7:35 PMorchestrator=DataPlatformUrn(ADF)
rather than just orchestrator=ADF
. Are the ADL ones that are working set up this same way? I see some results above that have ADL with the same set up where it looks like: urn:li:dataJob:(urn:li:dataFlow:(urn:li:dataPlatform:ADL,flow11,PROD),job11)
can you confirm that the logo shows up on that Urn? The issue is that we have logic to map the name -> platformUrn, but you have the fully formed platform Urn as the orchestrator so it tries to find a platform with the name "`urnlidataPlatform:ADF` " rather than the name "`ADF` " and since you don't have a platform urn set as "`urnlidataPlatformurnlidataPlatformADF` " it does not find the result.
See logic here: https://github.com/linkedin/datahub/blob/master/datahub-web-react/src/app/entity/dataFlow/DataFlowEntity.tsx#L99-L110most-solstice-19338
01/24/2022, 9:31 AMorange-night-91387
01/24/2022, 4:44 PMIt is case-sensitive. It has to be exactly same as in your urn. What is your urn and what did you add for your data platform exactly?
This is meant as it has to be "ADF" rather than "adf" or other variations, not that it is supposed to be exactly the full Urn. Sorry for any misunderstanding there. I looked at the implementation of the getLogoFromPlatform method and it looks like there is a static list of supported platforms from the frontend rather than it executing a query:
https://github.com/linkedin/datahub/blob/master/datahub-web-react/src/app/shared/getLogoFromPlatform.tsx
since your platform is not in this list, it doesn't work. Synching with team for a solutionmost-solstice-19338
01/25/2022, 11:28 AMorange-night-91387
01/25/2022, 7:21 PMurn:li:dataJob:(urn:li:dataFlow:(ADF,flow11,PROD),job11)
assuming you have a platform that matches: urn:li:dataPlatform:ADF
with a proper logo url.most-solstice-19338
01/26/2022, 1:00 PMorange-night-91387
01/26/2022, 3:37 PMorange-night-91387
01/26/2022, 4:04 PMmost-solstice-19338
01/27/2022, 12:00 PMflow_id
and job_id
How are they to be used?
My guess would be: flow_id
is identifying the "workflow" and job_id
is an actual run of that workflow. I am not sure that is how it should be used? If so, I get new entities for each job_id
.orange-night-91387
01/27/2022, 5:45 PMmost-solstice-19338
01/28/2022, 3:07 PM