Hello. In the Python SDK I see that making a conta...
# ingestion
a
Hello. In the Python SDK I see that making a container URN expects a guid in the
make_container_urn
function. Is there a particular reason for that? Can we use another string instead of a UUID given the fact that our custom containers always have a unique name?
📖 1
🔍 1
l
Hey there 👋 I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: ✅ There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? ✅ button ✅ It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? ✅ button Did you find a solution to your issue? ❌ Sorry you weren't able to find a solution. I'm sending you some tips on info you can provide to help the community troubleshoot. Whenever you feel your issue is solved, please react ✅ to your original message to let us know!
m
Are you thinking of generating your own container urn? Or you want to fork the make_container_urn?
Make_container_urn just simple except a string. The ContainerKeyClass however generates the guid to pass into make_container_urn
a
I'm working on ingesting dataset meta-data from a legacy system which processes those datasets from multiple sources. I'd like to create a container for each source (which has a unique name) and add datasets to it.
The legacy system doesn't know which uuid DataHub assigns to which source name, so using the source name as a container identifier simplifies the modelling significantly. I don't have to lookup a container first to retrieve it's uuid if I want to add a new dataset to it for example because I can directly provide the source name in the container URN.
m
I understand your usecase, i would suggest you to use transformer Essentially transformer can intercept any metadata change event/proposal before it is sent to GMS (the server). You can do is to check the urn, if the type is container, get the name of the container and replace the urn with a new urn which has human readable string https://datahubproject.io/docs/metadata-ingestion/docs/transformer/intro/#whats-a-transformer
s
Hi @modern-artist-55754 do you know how to detect container urn and use transformer to replace the urn/uuid with the source name?
I need to replace UUID of the container with its source name
i went through the transformers doc and couldn't find anything related to container specifically
m
So pre-ingestion you can query graphql for all containers with its name, build an in memory lookup (i.e. in the config method of the transformer), then when you process the mcp, you can just look up the id. Tbh i havent look at the mcp and how it looks. But be very careful with the container name, you might have duplicated names.
s
thank you, i'll give it a try