https://datahubproject.io logo
Join Slack
Powered by
# deploy-multi-tenant
  • l

    little-megabyte-1074

    03/21/2023, 7:43 PM
    set the channel description: Channel dedicated to discussing options for multi-tenant deployments of DataHub
  • f

    fierce-student-84107

    03/21/2023, 7:51 PM
    Thanks @little-megabyte-1074. Will post my question soon for @brainy-tent-14503’s advice.
  • f

    fierce-student-84107

    03/21/2023, 9:34 PM
    Hi @brainy-tent-14503, Following up on our conversation about adding multi-tenant support, during today’s morning office-hours call. Currently we are using following approach to make persistence layer (say Restli API) aware of _tenant_id_ and _tenant_db_: • We use PostgreSQL as our Data Store and do have new columns added to include _tenant_id_ and _tenant_db_. • As part of
    AuthenticationFilter
    ->
    doFilter
    we derive these tenant information from received tokens using
    getClaims
    . A newly defined object
    TenantContext
    (which is also a ThreadLocal) holds tenant information derived and gets bundled with
    AuthenticationRequest
    context. With this context passed, the tenant information was made available to relevant processing threads. Disadvantage to this approach: • We ended up passing
    tenantId
    and
    tenantDb
    to all necessary called routines, so that Data Store can be updated/read. • Maintenance of this delta code is a challenge for each DataHub version upgrade. We kind of used similar approach towards
    GraphQLController
    , Kafka etc. What are your thoughts about above approach? Please let us know if you have any suggestions to overcome the disadvantages mentioned. Thank you.
  • f

    fierce-student-84107

    03/23/2023, 4:40 PM
    Hi @brainy-tent-14503 / team, Really appreciate, if you or someone can share your thoughts on above design. Thanks.
  • b

    brainy-tent-14503

    04/05/2023, 1:52 PM
    Sorry for the delayed response, I have been on vacation and have just returned. While eventually the authentication filter, jwt tokens, and data store layers would include some information around the tenant id, I think one of the first questions to consider is its impact on the data model. For example, should our `identity`/`actor` urns, the `corpUser`/`corpGroup` include a tenant id inherently in its urn? I think that there is an argument to be made that something like security should be included at the most fundamental levels. Updating the data model has the implication that even for an admin, or the root user, there is no possible multi-tenant admin user. There are only admin users per tenant. Similarly the
    corpGroup
    fundamental ids should likely include a component for the tenant. The approach above is essentially routing users to data and imho more error prone to exposing data then actually basing the access grants on a tenant scoped unique identifier. The jwt token already includes the user urn and the tenant id is naturally extracted from the urn without modifying the filtering. The application code does have to extract and use the tenant id for data access. That said I am only one voice at Acryl and would need a bit of thought from others like @big-carpet-38439 and @mammoth-bear-12532 here as well.
  • f

    fierce-student-84107

    04/05/2023, 4:07 PM
    Thanks much @brainy-tent-14503 that makes lot of sense. @little-megabyte-1074 can you please include @big-carpet-38439 and @mammoth-bear-12532 so we can get their thoughts too?
  • q

    quiet-kangaroo-60946

    04/25/2023, 6:51 PM
    Has anyone attempted to create a kind of hybrid multitenant deployment with a shared backend utilizing AWS managed services with separate UIs for individual clients? We're exploring ways to deploy this to separate clients while saving as much as possible on cost.
  • s

    salmon-exabyte-77928

    05/03/2023, 5:19 PM
    Does anyone find a way to deploy a separate Datahub (two or more web UI, actions, and gms) with shared Kafka, Elasticsearch, and PostgreSQL? We didn't find a way to separate the data (it's a restriction) in one installation (we didn't find a way to isolate data with roles, and groups. And users see data from each other), multitenancy seems like a good way for this. I think it should be possible to use not separate prerequisites only for Datahub and share between different setups. @brainy-tent-14503 Do you have any thoughts about it? Thanks.
    b
    • 2
    • 10
  • b

    better-battery-99932

    10/09/2023, 3:21 PM
    I'm just started exploring DataHub. My application would also be as an infrastructure component beneath a multitenant SaaS offering. I am trying to assess if domain offers me sufficient capability if I make a few assumptions: • assumption 1 - only user would be my functional id. In other words, my single id will control all access • assumption 2 - I would be callign the API for all interactions for loading and retrieving metadata • assumption 3 - I would use my tenant id as the "domain" for all such interactions (i.e. load with the domain, request with the domain) I recognize this is not a perfect replacement for a security based approach to tenancy ... but wondering where folks would suggest that such an approach may break down. For instance, are there entity relationships or functions within the system that won't be constrained to fitting within a domain?
    plus1 2
  • b

    bulky-shoe-65107

    10/16/2023, 12:44 AM
    has renamed the channel from "muti-tenant-deployment" to "deploy-multi-tenant"