Hi guys! I saw that we have “Data privacy manageme...
# getting-started
h
Hi guys! I saw that we have “Data privacy management for datasets” on the roadmap. Do we have any concrete plans for this yet? We would be interested in helping with the implementation
And hopefully we could expand this to dataset fields
g
Out of curiosity, what would you be looking for specifically?
m
@high-hospital-85984: are you imagining data field level compliance tagging here?
h
To start with we want to add “sensitivity” tags to tables, more specifically to columns.
m
Yeah that is something we have built and use heavily at LinkedIn ..
h
yep, super simple to start. Tags would preferable be quite customizable, as they might differ between companies (we use a colour based scheme)
m
The way we did it was two-level ... first create a "taxonomy" of data types (e.g. email / phone-num / ... ) and then allow fields to be tagged with them... and separately have a relnship between the data types -> (is this a sensitive type or not?)
h
Cool, any chance of getting that into the open source code? 😉
m
if you're offering help, I'll take you up on it 🙂
h
But we wouldn’t mind adding some functionality to tag dataset fields, if the community sees some value in it.
1
If you can point us in the right direction, we can take it from there. BaseFieldMapping sort of looks promising, but seems to be tightly tied to transformations?
g
Out of curiosity, what set of sensitivity tags are you interested in? Would it be a binary of sensitive/sensitive or will you have a scale?
I've been thinking about creating an alternative table view that would allow for different types of tags, this is a mock I'm working on:
Does this presentation align w/ your vision?
h
Our scale (currently) is green, yellow, orange, and red.
So what you’re showing seems quite aligned, yes. I’m assuming that the
PII
and
Financial
tags there are strings?
I’m secretly hoping for a tag-propagation feature, like in Apache Atlas, so hopefully we can find a solution that takes us closer to that 🙂
m
Yeah tag-propagation would be the next step for sure. This is just making sure that the tags render well for human consumption.
☝️ 2
h
But how do you suggest we get the ball rolling with this?
m
I would say, providing feedback on the metadata models ... what is working and what needs to change .. would be high leverage