Hi everyone is it possible to restrict manage the ingestion DataHub #ingestion

Hi everyone - is it possible to restrict / manage ...

stocky-noon-61140

09/20/2021, 9:47 AM

Hi everyone - is it possible to restrict / manage the ingestion privileges? In other words: With my locally installed version of Datahub, I don't need to authenticate to ingest data into Datahub. Is there a possibility to require username/passwort authentication before metadata can be ingested?

better-orange-49102

09/20/2021, 10:53 AM

if you're using the rest api sink, dun think there is any way to restrict ingestion, since gms is open for anyone to access. John mentioned before they use their own proxy to limit access to GMS. i think kafka sink can implement ACL though (not familiar with kafka, sorry) source: https://datahubspace.slack.com/archives/CUMUWQU66/p1628772070237500

big-carpet-38439

09/21/2021, 8:56 PM

Hey there! xL is correct. We use a lightweight proxy sitting in front of GMS to validate requests across an untrusted network. Currently the ingestion API provides no authentication mechanisms. That being said, it is something on the core team radar and is in the early stages of thinking currently. We are considering which mechanism for authenticating, as well what the source of truth for authentication credentials, will be. Depending on the approach we decide on the level of work required can vary significantly. Here are a few things we're considering: • File-based username / password authentication (file w/ username and hashed passwords provided at deploy time) • Store-based username / password authentication (db storing username and salted hashed passwords, more work) • Token-based authentication: mechanisms to grant and validate OAuth access + refresh token by the backend APIs, with initial login happening either over OIDC, LDAP, or one of the username / pass mechanisms noted above. Which approach would best suit your use case?

bland-orange-95847

09/22/2021, 6:45 AM

joining this thread with a big plus one as we are currently also at the point were we would need authentication on GMS to restrict the access 🙂

bland-orange-95847

09/22/2021, 7:30 AM

what is the actual use of the bearer token the rest emitter allows to be set in the header?

big-carpet-38439

09/22/2021, 4:26 PM

Hey @bland-orange-95847 - Currently in the OSS, there is no purpose. It is not actively read or validated anywhere. In our Cloud (hosted) offering, this token is validated by the proxy service mentioned above^^

big-carpet-38439

09/22/2021, 4:26 PM

But of course whatever we build in OSS will just end up reusing this

bland-orange-95847

09/22/2021, 4:35 PM

Okay thanks. Good to know, cause we may use both, your hosted and OSS. So for now we may need to build a proxy as well and extend the RestEmitter to support auth

big-carpet-38439

09/22/2021, 4:42 PM

As long as your proxy can verify Authorization Bearer tokens, you should not need to make any changes to RestEmitter

big-carpet-38439

09/22/2021, 4:42 PM

If you provide a "token" in the config, it will get added to the Authorization header

🌮 1

bland-orange-95847

09/22/2021, 4:46 PM

yeah right. This would be an option as well 👍

2 Views

Open in Slack

Previous Next