Hi everyone! I'm working with a team on evaluating...
# contribute-code
b
Hi everyone! I'm working with a team on evaluating how to add support for multi-tenancy in DataHub. We've investigated a number of proposals, such as use of Domains and others. One approach involves extending the internals of DataHub. The easiest way to describe it (there are several variations), would be adding a tenantId field (supplied in HTTP headers) all the way from the API handlers to the DAO layer, e.g., ESSearchDAO.search. Then filtering on a tenantId field, which has been added to entities by extending them. We were hoping to get some feedback on whether this would be a good approach and also if there would be interest in us contributing this back to DataHub. The code to do something like this is not difficult, but it is a bit invasive. Just wondering if anyone on the dev team is available to help guide us.
1
l
@bright-apartment-55836 what is the main use case you are trying to tackle which requires multi-tenancy?
b
We operate a multi-tenant cloud analytics platform which hosts multiple customers. Each customers data must be kept secured and isolated. The use case is to provide search for metadata, such as datasets and workflows.
What we are thinking to do is instrument our services, such as a Workflow Service, to publish workflow lifecycle events. The lifecycle events would be extended from existing entities with a field for a tenant ID, and possible other information.
We are planning to use the DataHub back-end services, but not the front-end/ui as we need close integration with our existing ui. The front-end would provide a session cookie or access token (or the actual tenant ID) in the HTTP headers of requests sent to the API, such as the GraphQL search API.
g
I assume that you intend to provide possibility of querying all the tenants for users having appropriate access rights, correct?
b
So far, I haven't looked into the user management aspect, but I need too. I've been very focused on the search aspect. With regards to your question, I was thinking that queries for users with access rights wouldn't cross tenant boundaries with a tenant being virtually separate/isolated from other tenants, but I'm not sure that aligns with your assumption. So, within a tenant, users with access rights could be queried, but not across tenants.
g
@bright-apartment-55836: Sorry for coming back to this so late. What I meant was that it's probably useful to have possibility of administrators of the whole installation who might be eligible to cross the tenant boundary (as long as the policy is configured that way, of course - policies would need to be extended for this multi-tenancy support as well, of course).