Hi Folks just found that then manually constructin...
# troubleshoot
s
Hi Folks just found that then manually constructing Airflow by
mce_builder.make_data_flow_urn('airflow', dag_id)
dataFlow objects urn does not contain
dataPlatform
, like this:
Copy code
urn:li:dataFlow:(airflow,aws_transforms_sportsbook,prod)
is this an issue or feature? this leads to unable to operate with objects via cli:
--platform airflow
- searches nothing in other hand UI shows object under correct platform
do i need to construct
orchestrator
attribute like urn with
dataPlatform
or same fix should be done in
make_data_flow_urn
additionally
--entity_type dataFlow
searches nothing to
Copy code
dmytro.kulyk@MB-DAT-564087 airflow % datahub delete --env prod --entity_type "dataFlow" -n
[2022-03-19 20:39:49,482] INFO     {datahub.cli.delete_cli:206} - datahub configured with <http://localhost:8080>
[2022-03-19 20:39:49,667] INFO     {datahub.cli.delete_cli:219} - Filter matched 0 entities. Sample: []
Copy code
dmytro.kulyk@MB-DAT-564087 airflow % datahub delete --env prod --entity_type "dataJob" -n 
[2022-03-19 20:40:36,609] INFO     {datahub.cli.delete_cli:206} - datahub configured with <http://localhost:8080>
[2022-03-19 20:40:36,787] INFO     {datahub.cli.delete_cli:219} - Filter matched 0 entities. Sample: []
o
Hi! The resultant platform is expected, we do some coercion to a platform under the hood for the orchestrator field.
Looks like there is a bit of faulty logic in the python script that expects the entity type to match specific strings:
Copy code
if (
        platform is not None
        and entity_type == "dataset"
        or entity_type == "dataflow"
        or entity_type == "datajob"
    ):
if you try it with dataflow or datajob it should work
i.e.
datahub delete --entity_type dataflow --platform airflow
Will get this where it supports case-insensitive: https://github.com/datahub-project/datahub/issues/4461
s
@orange-night-91387 so as far as i understood there is an issue while urn for
dataFlow
generated? and
dataPlatform
needs to be added under the hood? not explicitly when passing to
mce_builder.make_data_flow_urn
o
There is no issue in the Urn creation. That output is how it is expected to look
The reason your platform based search on the delete_cli failed was because there is an error in the Python script that expects only exact matches "dataflow" and "datajob"
thank you 1
s
how urn for dataFlow should look like? currently i can see as follow: • dataFlow:
urn:li:dataFlow:(airflow,aws_transforms_ams,prod)
• dataJob:
urn:li:dataJob:(urn:li:dataFlow:(airflow,aws_transforms_sportsbook,prod),sportsbook.selection_key)
do these urns should contain
dataPlatform
section as dataset has:
urn:li:dataset:(urn:li:dataPlatform:athena,raw.ams_schema_updated,PROD)
o
Nope, each Urn can have different fields, not all of them have a platform urn. Those ones look right to me 🙂
s
+
h
Hey @shy-parrot-64120, did you ever figure out a workaround to specify a dataPlatform? I have an orchestrator that I want be its own platform… just like airflow. When I create the dataFlow entities, the dataPlatform is not auto created. How should I create this platform and makes sure that the dataFlow points to that platform?
s
Like this. hope this helps
please note that given method using (parsing) our own dag configurations (defuned in yaml for dynamic dag genaration in our orchestration framework) it’s not working with airflow itself directly
h
^ Still doesn’t make sense to me. You’re not explicitly a dataPlatform to the flow. Will the flow be associated with Airflow as a platform?