Good morning everyone I have just managed to ingest data fro DataHub #feature-requests

Good morning everyone, I have just managed to inge...

microscopic-mechanic-13766

08/16/2022, 11:25 AM

Good morning everyone, I have just managed to ingest data from a Kerberized Hive. The thing is that I had to do some previous steps in order to be able to ingest such data: • The first step was to install the following dependencies

libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit krb5-user

(although I think the only ones needed would be the last 2) • Then I had to made a kinit of the user who had enough permissions to read the data from the tables. Could it be possible to add some fields/properties in the recipes so that you would be able to indicate both the username (with realm) and the password (or path for the keytab) of a user? Another possibility would be to do a kinit with the user that has logged in via OIDC (if it has), but obtaining the password from the OIDC would not be easy I could do a PR for the dependencies, but the other aspect I think would be more complicated and I might not be able to do it. Thoughts?

dazzling-judge-80093

08/16/2022, 12:02 PM

Can you elaborate a bit what we should do if the user set the proposed username and password to make it work?

microscopic-mechanic-13766

08/16/2022, 12:37 PM

Okey, so for the first option (adding the fields in the recipe) the user would indicate the information needed (username and password/keytab) so that internally, previous to the execution of the ingestion process, a kinit would be made so that the ingestion of the data of the Kerberized source could happen successfully. It could also be done in the middle of the execution of the ingestion (after all the packages have been loaded), but I think it would be messier. In the other case, as the login is made, the kinit with the information of provided in the login would be done. Hope I have explained myself properly, if not, let me know.

microscopic-mechanic-13766

08/16/2022, 12:50 PM

In fact, another field would be needed, as the default realm would have to be indicated too. Additional information to indicate the location of the KDC might be needed as well (although, for some reason I don't know yet, I am able to correctly ingest data using the krb5.conf default file without specifying the location of the KDC and other aspects (like the domains))

microscopic-mechanic-13766

08/16/2022, 12:58 PM

These two last additional pieces of information could be avoided to be indicated if, in some way, the user was forced to map (or something like that) their krb5.conf to the actions container.

dazzling-judge-80093

08/16/2022, 1:34 PM

ahh, so if I get it right with the first option datahub cli would run the kinit and not the user, right?

plus1 1

little-twilight-71687

08/16/2022, 1:54 PM

In such cases the best way is to add support keytab for CLI (no kinit password required). BTW. I'm not sure it's worthy to add keytab support to UI ingesting, it's better to pass user's kerberos ticket, but it's much harder to implement.

little-twilight-71687

08/16/2022, 1:56 PM

btw, both methods (a) detect already established kerberos session, b) use keytab) is a typical workflow, but keytab is much more suitable for automation.

microscopic-mechanic-13766

08/16/2022, 2:04 PM

That is right @dazzling-judge-80093, or at least a way to be able to make all the process in the UI and not have to be looking at 2 places

dazzling-judge-80093

08/16/2022, 2:06 PM

Is it a standard thing tool running kinit instead of some kind of central service running it and all the apps using the generated token?

little-twilight-71687

08/16/2022, 3:29 PM

It depends. For instance. samba server uses keytab by itself for checking users kerberos tickets. It's like a system service that uses keytab by its own.

little-twilight-71687

08/16/2022, 3:30 PM

Your Spark application uses keytab for running tasks on kerberized HDFS storage.

little-twilight-71687

08/16/2022, 3:32 PM

I, as a user, use kinit to access samba/nfs/hdfs, access some kerberized frontends (HDFS web ui, for instance) etc.

little-twilight-71687

08/16/2022, 3:34 PM

I recommend you guys to read this book. It's old, but Kerberos is old too.

2 Views

Open in Slack

Previous Next