Hi, I found all ingested glue jobs have dataFlow u...
# ingestion
f
Hi, I found all ingested glue jobs have dataFlow urn only, doesn't have job urn. So I can't see lineage even for the job with auto generated script. Is there anyone knows how to update glue jobs as Datajob, not DataFlow? Below is my yaml ingested the jobs, tks.
Copy code
source:
  type: glue
  config:
    aws_region: "ap-northeast-2"
    extract_transforms: True
    catalog_id: "catalog_id"

sink:
  type: "datahub-rest"
  config:
    server: "gms sever address"
g
The glue source currently parses your autogenerated scripts, and each script maps to a DataHub dataFlow with multiple dataJobs nested inside
Could you provide some more detail on how your setup looks and what you’re looking to see in DataHub?
f
@gray-shoe-75895 Currently, my all glue jobs processing ETL from Redshift tables to a Redshift table with user defined queries. Therefore, Datahub can't detect each job in the scripts, so I was trying to find the way to change a whole job to a DataJob, not a DataFlow. But seems no way to do it, so I just manually created a job with a same name with a flow to ingest lineage by myself. If there's any better idea, please advise, thanks for your response.
g
To make sure I understand, each of your glue jobs has a single task, and that task does some ETL work - is that right? I think in that case you’ve got the best workaround already