Hi All! I am ingesting data from s3 using the dat...
# getting-started
h
Hi All! I am ingesting data from s3 using the datahub pipeline (
datahub.ingestion.run.pipeline
). Then to update properties, tags, owners etc: 1. I search for the uploaded dataset urn using the resli api with endpoint
/entities
because I do not know the urn beforehand 2. Update the urn The problem that I face with this approach is that urn creation takes some time depending on the dataset. And usually it is not ready when I try searching for it. Is there some way to know when the urn is created after the pipeline is run? Or there a better way to do this?
1
d
Hey Saad, you can use GraphQL
entityExist
to validate if the given urn exist or not. cf: https://datahubproject.io/docs/graphql/queries#entityexists https://datahubspace.slack.com/archives/CV2KB471C/p1683516173417029
h
Hey Hyejin, Thanks for your response! I can see that search via python sdk should be available soon. But is there a way to get the urn name upon ingestion? Because that would remove the need to search
m
It probably takes a bit of time to have the dataset indexed in ElasticSearch. You should use transformer during ingestion instead of doing a search then update