Hey team, We're developing a new entity in-house t...
# contribute-code
g
Hey team, We're developing a new entity in-house that we're calling a "Business Attribute" which we believe may be useful to the community at large. The idea is that a lot of different fields in any system reference the same concept (something like a "customer ID" for a sales database or "product name" for an inventory database). For these, specifying a basic description, set of tags and glossary terms globally helps maintain consistency across the system. Any updates made to the Business Attribute get propogated to the fields it is associated with. At the field level, you can append onto the description and set of tags/terms you inherit to give additional context. Before proposing an RFC, we wanted to get some feedback from the community around the idea. Please let us know your thoughts. cc: @plain-jordan-93410 @rapid-london-24785
m
Interesting. Couple of questions: 1. In what way are glossary terms insufficient as a way to support this? 2. Are there other systems which support this concept? Or is this a net new concept that your organization is proposing?
g
Context: Starting off with some context, our organisation has 1000+ core datasets with an average of 500+ columns, which means that there are about ~500k physical columns to her managed. All of these are created based on 8000+ centrally managed attributes for all business definitions across Hive, DB2 and other systems. These core datasets get transformed into 500k to a million datasets via Spark/Hive or various jobs. We need to think of auto enrichment of these in the future. The idea is to manage these 8000+ attributes centrally and have various aspects of attributes across core and transformed datasets managed through these
Difference from glossary terms: Glossary terms are standard tags used to describe the meaning of business terminology which are associated to physical assets. Business attributes are logical fields, which are defined in terms of their business meaning rather than their physical implementation. This means that a business attribute is independent of the specific database or data warehouse it is stored in. Business attributes can define the following aspects of a field they are associated with: • definition/description • Glossary term/tag relationships • Data rules (assertions) • Class, format, regex pattern Business attributes accelerate the metadata enrichment process, as they can be attached to multiple physical implementations of columns. Any enrichment applied to the business attribute will get pushed onto all fields it is associated with
I hope this is helpful, @mammoth-bear-12532! Let me know if you have any other questions. If required we can catch up over a call and walk through the RFC document we're working on
m
Thanks for sending this over @gentle-hairdresser-45610! Quite useful
r
Ty @mammoth-bear-12532 This feature introduces the logical layer modeling. Though we are starting with Attribute, we need to extend it to Datasets as well. This will enable us connecting entities across clusters / Data centers. Which can report the schema inconsistency issues in a datalake. Once we publish this feature, we can prioritize other related work. Please let us know if you need more information or good to start with RFC PR process.
d
@mammoth-bear-12532 As per last discussion regarding Business Attributes with @gentle-hairdresser-45610 and @rapid-london-24785, @high-air-78476 have opened the PR for RFC for business Attributes: https://github.com/datahub-project/rfcs/pull/6. Kindly take a look at it.
m
@rapid-london-24785 @dry-raincoat-85182 thanks for bringing this to my attention
we'll take a look at it and get first round of comments within next week
h
thank you @mammoth-bear-12532! We look forward to your comments.
@mammoth-bear-12532 Did you have a chance to review the RFC?
m
@high-air-78476: apologies for the delay. We have looked at it internally and will have comments out in the next day or two!
thank you 1
h
@mammoth-bear-12532 Thank you very much for taking the time to review our proposal! We look forward to your comments.
plus1 1
d
Hi @mammoth-bear-12532 Thanks for taking time and provide the feedback on the RFC. We have provide the answers to your queries and looking forward to hear from you. @rapid-london-24785 @high-air-78476
Hi @mammoth-bear-12532 We have raised the PR for introduction of new entity "Business Attribute". @high-air-78476 @rapid-london-24785