Hey folks! I'm trying to delete a container inform...
# troubleshoot
i
Hey folks! I'm trying to delete a container information from my datahub using datahub cli. The action performs well but when i access the datahub itself the informations still there? What am i doing wrong? Code that i used:
datahub delete --urn urn:li:container:37c7bb069d2b23708574c0bdf835dea8 --entity_type container --hard
The information in the website after the command is in the print bellow.
And this container information just appers in the athena datasource, when i take a look in the redshift datasource this info doesn't appear (i think this is the correct behavior)
i
Hello Bruno, What is the output of:
Copy code
datahub get --urn "urn:li:container:37c7bb069d2b23708574c0bdf835dea8" -a browsePaths
?
i
@incalculable-ocean-74010
Copy code
{}
But is still showing in the UI 🤔
Seems like the container entity was deleted from the database but the info is still there in some place
i've also restarted the front pod to check if there was any caching or anything else but it didn't change
The point is: i want to ingest the athena data without adding this "container" tag using the datahub ui
i
You mean you want the dataset to not have the
container
substring ?
i
Yes!
i
That’s what the athena source is meant to do, if you look at the capabilities: https://datahubproject.io/docs/metadata-ingestion/source_docs/athena#capabilities Since Athena is merely a serverless layer on data in S3 catalogued by aws glue (usually) I would recommend using the glue connector instead: https://datahubproject.io/docs/metadata-ingestion/source_docs/glue
i
But does the glue ingestion "brings" the container information?
The point is that this information is completely unused for us, i just wanted to doesn't bring that 🙁
And the capabilities in the athena doesn't contains any container information, just platform
i
But does the glue ingestion "brings" the container information?
I think so, do not believe it is a configurable behaviour, I can check with the core team though. Out of curiosity, what is your concern with the container information? Is it simply that you do not want to have it in datahub, is it a space concern in the underlying databases?
i
The first option basically. Is just because of patterns with others datasources. Example our redshift datasource:
Its quite annoying to have a container info in a table information
i
Are you saying that
urn:li:container:37c7bb069d2b23708574c0bdf835dea8
points to the same table as
socrates.plurall.fato_adesao_conteudo
or that it might happen?
i
No, sorry. The point is: Lets suppose that the entire datahub is empty. If i crawl the redshift information, there will be no container information in the frontend. But when i ingest the athena information, for some reason, the container information appears in the athena tables. I just wanted then with a pattern to doesn't have any container information for any datasources.
They don't have relation to each other, are different datasources
The point is more the urn name than other thing. Because when i query for all the containers in the metadata, i have this:
But when i access the tables from the "production_data_lake" dataset i have this:
Thats why i think that this container information from the athena dataset is broken @incalculable-ocean-74010
i
Thank you for the detailed answer Bruno! I’ve messaged the core team, we’ll see what is going on 🙂
i
You're Welcome! If you want any more information feel free to ping me 🙂
thank you 1
b
@icy-piano-35127 So you do want to show containers for Redshift, but not athena?
i
@big-carpet-38439 it's the opposite. I want that the containers info doesn't show for athena.