I ve been taking a look at the RBAC functionality over the p DataHub #feature-requests

I’ve been taking a look at the RBAC functionality ...

bland-wolf-37286

11/19/2021, 2:43 PM

I’ve been taking a look at the RBAC functionality over the past couple of days to see how it works and what the capabilities are. For this I’ve done an OIDC integration with Keycloak. I have a few questions on the features: • With JIT user/group provisioning, when a user first logs in, DataHub creates the user. Any roles the user has which don’t already exist in DataHub are created as groups and the user is assigned to those groups. If I then log the user out, change their roles in Keycloak and then log them back in to DataHub again, the user groups aren’t updated in DataHub to reflect their changed roles in Keycloak. Is it intended to work this way? If not, is there a feature on the roadmap to enable a user’s groups to be updated at login? • At present policies can be used to restrict a user’s edit actions on a per-dataset or all-dataset basis. The RBAC RFC mentions supporting wildcard matching - roughly when might that become available? • Similarly, we are going to need to be able to restrict what datasets a user can view - the RFC mentions this as being a milestone 2 feature. Roughly when might that become available? • In looking at how the edit controls work in practice, I found that it’s at the point that the user saves a change. This means that a user could make a lengthy edit to a column description and only find out that they don’t have permission after they try to save the change, at which point the edits are lost altogether as the pop up has been dismissed. A better user experience would be to prevent the user from even accessing the edit functionality, perhaps by additionally hiding the buttons. (It was good to see that the controls aren’t implemented just by hiding the UI elements though)

👍 3

big-carpet-38439

11/19/2021, 6:56 PM

Hi Ed thanks for the great questions. As you've identified this a very complicated topic. • Is it intended to work this way? : If this questions is asking whether we are aware of the current behavior, then yes. If the question is whether this is the long term "right solution", then no. We do want to support updating the user and group entities on EACH login, just have not yet implemented it. It is on our radar, but has not yet been formally prioritized. That being said, it should be medium effort to do, and we'd gladly accept a contribution here. • Wildcard Matching in Policies: To be honest, we are being very careful about releasing wildcard matching; we aren't confident that users should be exposed to the internal details of how URNs / primary keys are constructed. That being said, once we have captured "containers" (databases, schemas, collection of assets) on DataHub, we do intend to support predicates based on those. Perhaps you can detail the use case you are trying to achieve? • View Based Policies: Tentative timeline is first quarter of next year, but we'd really like Community participation to help move this item along more quickly. What types of metadata are you hoping to restrict? • Proactive Privilege Indicators: We are aware this is a bad experience that can cause a lot of pain. Implementing this is non trivial amount of effort, however. We recommend that Administrators communicate user's privileges to their users until this is implemented. However, I'd also consider this to be "Phase 2 Policies" work that should be included in the Q1 next year deliverables. How urgent is this for you folks? I'm assuming lower that the other asks you've raised?

better-orange-49102

11/20/2021, 2:24 AM

Thanks John for the best news I heard all week😆 (even compared to the town hall)

bland-wolf-37286

11/22/2021, 12:40 PM

Hi John, thanks for the detailed reply. • Is it intended to work this way? It’s great to hear that updating user/group entities on each login will be supported at some point in the future. It may be that pre-provisioning users/groups, regularly syncing with the auth solution, is the right way for us to go in the meantime. I’m not sure if I’d be able to contribute to this functionality myself as my team are currently only at half strength in terms of engineers so it might be difficult to devote the time that would be needed given other priorities - I can certainly raise it though. • Wildcard Matching in Policies: When we roll this out, we’ll have a few thousand datasets to work with. We’ll need to be able to, for example, restrict individuals/teams to only being able to edit descriptions for datasets that are related to their business domain or which are owned by their team. We’ll also want to be able to restrict which datasets individuals can view since there may be sensitive information, such as how fields/datasets are used as part of detecting fraud, which shouldn’t be accessible to individuals outside of relevant teams. • View Based Policies: As mentioned above, we ultimately want to be able to restrict which datasets individuals can view details of. Ideally, I think an individual who hasn’t got view permissions for a given dataset shouldn’t see it in the list of datasets or in search results either. I don’t know whether we would want/need to restrict at a finer level of granularity than just at the dataset/entity level. I appreciate that view based policies are much harder to implement than edit ones because of the impact on search, etc. • Proactive Privilege Indicators: This impacts the user experience, but I very much doubt it would be a blocker to us rolling out DataHub (I understand an evaluation of different tools in this space was conducted earlier this year and DataHub was streets ahead of anything else)

little-megabyte-1074

12/01/2021, 2:09 AM

Hi folks! There’s a lot packed in this thread, but not entirely sure what net-new requests should come out of it. Please check out our existing feature requests & raise a new one if need be! https://feature-requests.datahubproject.io/

Open in Slack

Previous Next