https://datahubproject.io logo
#advice-metadata-modeling
Title
# advice-metadata-modeling
l

lemon-tomato-17970

06/16/2022, 10:52 AM
Hi team, I'm playing with the datahub glossary trying to ingest some data into local datahub. During the process the names of the nodes are changed from what I expect to see to some other format I do not need. My code is:
Copy code
version: 1
source: DataHub
owners:
        users:
                - datahub
nodes:
        - name: ParentNode
          description: ParentNode description
          nodes:
                  - name: ChildNode
                    description: ChildNode description
In the end of the day I get a glossary with one node named "ParentNode" and another nested node named "ParentNode.ChildNode". And I expect to see the initial names given in the script. Please help me with understanding how the node's urn is created during the ingestion, so I could get an expected result. Thank you!
b

bulky-soccer-26729

06/16/2022, 1:56 PM
hey Larysa! are you using a newer version of datahub where you have the updated way of viewing your Glossary? (with the navigator on the left) If so I believe this is a cleanup piece that we need to take care of since we're displaying nodes and terms differently than we used to. The main difference is that we were storing the hierarchy of nodes and terms in the urn in order to know who was a child of what. We don't need that anymore but instead rely on
parentNode
on each entity to get the hierarchy - allowing us to edit the names of nodes and terms via the UI!
so in the short term, you can easily edit the name of your nodes that look different than you expect in the UI by clicking the edit icon next to the name
and in the long run we'll put out a fix to ingest names of nodes inside of nodes properly!
l

lemon-tomato-17970

06/16/2022, 2:05 PM
I'm using the DataHub CLI version: 0.8.36 It has a navigator through the glossary items on the left - yes. Thank you for your answer. Will wait for the update))
Also I think I found a bug in the datahub. I've used as a name of the node a string with the "/" symbol in it. And now I'm not able to delete this group using three different options: 1. manually through the UI: I see a green circle with the checkmark inside and a message that the item was successfully deleted but it is still displayed on the page; 2. through the CLI command datahub delete --urn 'urnliglossaryNode:Wrong/Name' The command rans successfully but I'm still seeing the item 3. through the CLI command datahub ingest rollback --run-id RUNID
@bulky-soccer-26729 - the results is the same - I still see the node, but the commands had ran successfully
b

bulky-soccer-26729

06/16/2022, 2:13 PM
ohh very interesting. do you know if this is a new bug or something that has been around for some time? obviously the ability to delete via the UI is new, but I'm wondering if having the slash in the name has affected CLI deleting as well. If you try to delete other nodes or terms without a slash in the name do you have any problems?
l

lemon-tomato-17970

06/16/2022, 2:25 PM
Actually I've managed to delete this node by: I used a datahub delete --urn 'urnliglossaryNode:Wrong/Name' command from vise versa. I mean my "wrong/name" node was located inside of the other node like this: "ParentNode"-->"ParentNode.ChildNode"-->"ParentNode.ChildNode.Wrong/Name Node" (please note that there was a white space symbol between two words there also) This item was shown in the UI as: "*ParentNode.ChildNode.Wrong*". The part of the name after the slash was missing. As I said I've tried to delete it in different ways using both original and showd by the Ui names in urn string. But it still kept staying. I've tried to delete the parent node using the datahub delete --urn 'urnliglossaryNode:ParentNode' command and it deleted the parent node and kept all of the insides. After that I've deleted the Child node using datahub delete --urn 'urnliglossaryNode:ChildNode' and it left only my wrong name node with the original name shown. The next datahub delete let me get rid of it totally. The command used: datahub delete --urn 'ParentNode.ChildNode.Wrong/Name Node'
This thing is interesting because it is not possible to delete a parent node from the UI manually if it contains some data inside
b

bulky-soccer-26729

06/16/2022, 2:27 PM
ah okay gotcha. Yeah we have seen issues with special characters in names of glossary nodes and terms since we put that directly in the urn. once we make the update to use a UUID on yaml-based ingestion and set the name in the node/term properties I think your issue will be resolved!
l

lemon-tomato-17970

06/16/2022, 2:27 PM
Thank you! And have a nice day))
b

bulky-soccer-26729

06/16/2022, 2:36 PM
you too!
c

clever-beard-17281

06/17/2022, 12:31 PM
@lemon-tomato-17970 did you succeed in ingesting the glossary with the cli ? The sample at github also didn't work for me https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/sample_pii_glossary.yml
b

bulky-soccer-26729

06/17/2022, 1:07 PM
@clever-beard-17281 so you weren't able to ingest at ill with that recipe? or how did it not work? also if there are any error logs I would love to see them
c

clever-beard-17281

08/11/2022, 11:25 AM
@bulky-soccer-26729 I think @lemon-tomato-17970 has found the solution. I believe she wait for the next version of datahub to be released and then it worked.
5 Views