Hello all, I have a standard question what is the ...
# advice-data-governance
s
Hello all, I have a standard question what is the best practice for a data governance policy with datahub?
b
this question is very wide ranging and can be interpreted in many ways. would you like to define the question better?
s
I was afraid of that answer. let's say in a technical way the data stewarts can control with datahub who they give access to their data?
b
are you referring to controlling who can see the metadata of certain datasets? you need to know that the access control is not super stringent at the moment, a person can search for a forbidden dataset, and see that it is inside datahub. just cannot access the details inside when he clicks on it
you can also post a new qn if you have a more narrow qn, cos i doubt people will click on this thread haha
s
I just want to build something that the data owners can see who is accessing their data and to have an overview at the end which person has access to what data - to make an analysis over that. I assume this is possible with datahub also to get this information over Rest api? The reason is that we want to implement daas solution with datahub as single point of truth with the data access rights if the user is allowed to request this data from the core warehouse.
b
to my understanding, at the present capabilities of Datahub:
data owners can see who is accessing their data
=> currently, who visited what pages inside Datahub is visible only by datahub site admin. it is kept in a usage index inside ES. There is some basic analytics that show which datasets are popular but does not show who visited it
have an overview at the end which person has access to what data
=> a person viewing a dataset in Datahub does not guarantee that the person actually has a DB account to access the dataset in the database.
s
Thanks for the answers - To understand my request a little bit further. We consolidate the data in a central warehouse and the user logs in to a web page where he can request the data (to present him the data he can request we want to call the rest api from datahub to get the data out what he is allowed to see) - I understand this is not about physical request. In the background we are mapping this request to our internal datavault structure so the user can choose source system and what he wants to see. As far as i understood you, the request to datahub to check which person has the rights to access different datasets is not visible ?
b
I'm afraid I can't quite grasp the concept you're talking about😅. Perhaps someone more formally trained in data governance could take a shot instead
@mammoth-bear-12532 have you encountered this kind of cases before
s
I explain it more simple - I tend to get too complex in my explanations
data owner a gives person b access to his data
data owner b gives person b also access to his data
I want to request from datahub like what data person b is allowed to acces - the result should be dataset of owner a and dataset of owner b
Why I need this because I want to build up an integrated solution with databub as a central place for holding the data access rights, and also use the history of the ingested sources for automated change management for data pipeline generation. And at the end datamart generation based on the datahub data access rights. Completely self service for the end user and 100% automated.
b
not sure if this rfc is relevant to the discussion https://github.com/datahub-project/datahub/pull/4694/files
s
As far as i understood this are the permissions directly from the source system to get this information who has access there to the data?
b
😅I don't understand the rfc very well but it's the closest one I can recall having followed this project thus far
s
Thanks for all your help in the discussion. I really appreciate it 🙂 - What I need is not the information from the source system - just the information what are the defined data access rights in datahub. It would be strange that this feature would be not available.
e
hi @silly-ice-4153, i'm a noob here, but as far as I understand, DataHub is not aware of the actual access rights and permissions configured on data platforms for individual users and SAs using those platforms. DataHub only knows about the SAs it uses, and the access permissions configured for DataHub such as who can see what metadata about which data assets/processes in DataHub. I think what you need is achievable by extending the DataHub metamodel, and feeding DataHub with access registration and revocation events.
s
Hello @eager-australia-69729 thanks for the answer - I'm not interested in the access rights on the source systems / data platforms. I just want to get what access permissions are defined directly in datahub. which datahub user has which access permission to datasets. I think this is out of the box possible ?
e
yes, that should definitely be in there somewhere 😅
try posting in #getting-started, asking where do i find xx? how can i do xx?
e
Hey @silly-ice-4153, is your question whether it is possible to do a check whether user A has access to asset A? Or is it whether it is possible to get all assets user A has access to?
s
Hello @echoing-airport-49548 I want to know if it possible to get all assets user A has access to ?