A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

Folks: maybe <@U01N6DYG7T2> -&gt; how are you securing frontend and gms today (from outside)?  There are a few people interested in learning more. :thread:

can you elaborate on the outside part? As in from outside the org? Or just outside the code?

For GMS we have decided to keep it private. For Frontend it is hosted in GKE and at the Kubernetes service level we have enable Google's Identity-aware proxy so that anyone trying to get in will have to break Google's security first before they try to find bugs in datahub's Auth

sorry for not being clear: outside the code .. `[client] ---secure--&gt; [something] -- local --&gt;[frontend/gms]`

I would use TLS termination with AWS ELB if I understand your question clearly.

<https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-create-https-ssl-load-balancer.html#create-https-lb-console>

Is this a question about how people are handling Authentication in their environments?

<@UV0M2EB8Q> The question is not very clear. I don't think I asked clarification clearly either. Datahub does provide Auth.  I understand it might not have gone through a rigorous security testing but unless there is a big concern that should be enough.

What is the concern?
• Outside attackers breaking Datahub Auth? First thing is to add security scanning over your dependencies, docker images, penetration testing. An easier thing is to use one of basic cloud security layers like GCP IAP or use a VPN.
• DDoS protection? Can put it behind specialized security covers.
• Inside disgruntled employees being able to bypass datahub auth? Will need a security researcher/engineer to review datahub's codebase. Probably do an initial security scans and pen tests to fix them before getting the security engineer (mainly to save time). 

Sorry for not being clear: the question was about how people are securing the transport layer (ssl / tls etc.)

Sorry for the late reply.

Yes we are building an API in front of the GMS which will have role based access control. (AWS Api gateway+lambda backend.) Then this API is the only thing which can communicate with our backend which is locked down. Then in later phases we can also add easily additional validation logic here, for example only allowing specific roles to write to specific data platforms, which they would own. This way we are sure that nothing is accidentally overwritten by somebody who does not have access to this.

Reason for it is that we have requests now from other teams outside of our domain who want to use our catalog for their own use cases, so this would make it a bit more secure.

Quick plug for evolving the GraphQL API to be that authenticated, authorized API on top of the metadata graph: <https://datahubproject.io/docs/api/graphql/overview>