Hi A few questions about business glossary ingestion 1 I see DataHub #ingestion

Hi! A few questions about business glossary ingest...

many-rainbow-50695

10/09/2022, 6:24 AM

Hi! A few questions about business glossary ingestion: 1. I see that it's possible to add links to glossary term documentation using UI but I can't find relevant field in Datahub documentation and business glossary ingestion code. 2. Are properties 'inherits' and 'contains' reversible? If one glossary term inherits another does it mean that that another term should contain this one? If yes, than I think that it should be done automatically during business glossary ingestion 3. Is it possible to filter 'working set' of business glossary items using UI or any plans about it? I have a registry of semantic data types with 304 data types (business glossary terms) and their profiles include language and country information. Sometimes I may want to limit certain database to not to use certain data categories, or semantic types linked to some countries. Is that possible?

gray-shoe-75895

10/11/2022, 7:06 PM

1. You can add links on the glossary term’s page. You can either click the “add link” button in the right pane, or add the link to the term’s description under the documentation tab. 2. Inherits and contains are separate ideas, and not quite reversible. For example, a ProCustomer might inherit from Customer, while ProCustomer might contain BillingAddress. Inherits yields a “is a” relationship, while contains yields a “has a” relationship. 3. Could you explain a bit more about what you’re trying to do? Is it that you want a constraint that entities under “foo_database” can only use a predefined subset of glossary terms?

many-rainbow-50695

10/11/2022, 8:51 PM

Hi! I am mapping my collection of datasets to semantic data types, a concept quite similar but not identical to a business glossary terms. Here is an example of the semantic data types registry registry.apicrafter.io. For each semantic data type I have a lot of associated metadata: • name • description • country (if this data type is country specific data type, UK Ward code, for example) • categories • languages (if this data type is language specific) • regular expression • one or more links to Wikidata, schema.org and e.t.c • examples, classification, translations, wikidata_property, confidentiality and e.t.c. So I am trying to integrate it with Datahub. You may see one of recent generated import file here https://github.com/apicrafter/metacrafter-registry/blob/main/data/datahub/metacrafter.yml Also I have thousands databases that originate from different sources from different countries and different languages. I would like to limit subset of business glossary terms when I assign them manually.

gray-shoe-75895

10/11/2022, 10:14 PM

This data types registry is pretty neat! As for limiting the set of business glossary terms when assigning them manually, we don’t have a built-in way to do that right now. However, you should be able to set up a datahub-action that monitors for invalid term assignments and either sends you a notification or undo’s the change - would that fit your use-case?

many-rainbow-50695

10/12/2022, 4:16 AM

Thanks! I am not yet sure that it could solve my use-case, but I will try! The best solution could be if I could assign business term with confidence value_._ My semantic types detection engine metacrafter https://github.com/apicrafter/metacrafter analyses data source and generates pair: datatype as string and confidence as float. Here is a screenshot of an example of the output.

gray-shoe-75895

10/13/2022, 3:15 AM

This is somewhat a hidden feature in our system, but tags and terms can be given a “context” string each time they’re assigned to an entity. Perhaps that’d be a good place to store the confidence scores? Unfortunately we don’t really show it in the UI anywhere, but it is accessible in the underlying data

2 Views

Open in Slack

Previous Next