Hello (once more :D), couple of auth + users quest...
# getting-started
i
Hello (once more :D), couple of auth + users questions. Does DataHub have to store users in it's own databases for authorization purposes or can it defer that to systems like Active Directory? I.e: Can user X access datahub? As a follow up, does DataHub have a way to define which users can access what? I.e: Define a user group that can access a set of entity instances but not others: research data scientists can see research datasets but not production information meant for product teams.
m
@incalculable-ocean-74010: For the follow-up: you're sort of asking for the fine-grain authorization feature that is on the Q2 roadmap (https://datahubproject.io/docs/roadmap/#role-based-access-control) We don't have it right now.
for the first question: I'll defer to @big-carpet-38439 as he has the most knowledge of the current auth system.
i
I missed that Q2 improvement, thanks!
b
To answer the first - Datahub does not have to do this. You’re correct that users can be stored in your IdP. Datahub will attempt to join the user from idp store to the CorpUser metadata in GMA using a username
i
DataHub still has to store corpuser documents though right?
When a user logs into Datahub or searches for a user, won't DataHub use the corpuser elasticsearch document index instead of the IdP?
To fill out the user page in the datahub ui
b
Yes so when you log in via SSO we authenticate you as a particular user based on a username provided by the IdP. If the username already exists in Datahub you will see the corresponding corpuser profile
So we effectively do a join by username between the IdP account and the Corp users datahub knows about