Good morning everyone, I have just managed to inge...
# feature-requests
m
Good morning everyone, I have just managed to ingest data from a Kerberized Hive. The thing is that I had to do some previous steps in order to be able to ingest such data: • The first step was to install the following dependencies
libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit krb5-user
(although I think the only ones needed would be the last 2) • Then I had to made a kinit of the user who had enough permissions to read the data from the tables. Could it be possible to add some fields/properties in the recipes so that you would be able to indicate both the username (with realm) and the password (or path for the keytab) of a user? Another possibility would be to do a kinit with the user that has logged in via OIDC (if it has), but obtaining the password from the OIDC would not be easy I could do a PR for the dependencies, but the other aspect I think would be more complicated and I might not be able to do it. Thoughts?
d
Can you elaborate a bit what we should do if the user set the proposed username and password to make it work?
m
Okey, so for the first option (adding the fields in the recipe) the user would indicate the information needed (username and password/keytab) so that internally, previous to the execution of the ingestion process, a kinit would be made so that the ingestion of the data of the Kerberized source could happen successfully. It could also be done in the middle of the execution of the ingestion (after all the packages have been loaded), but I think it would be messier. In the other case, as the login is made, the kinit with the information of provided in the login would be done. Hope I have explained myself properly, if not, let me know.
In fact, another field would be needed, as the default realm would have to be indicated too. Additional information to indicate the location of the KDC might be needed as well (although, for some reason I don't know yet, I am able to correctly ingest data using the krb5.conf default file without specifying the location of the KDC and other aspects (like the domains))
These two last additional pieces of information could be avoided to be indicated if, in some way, the user was forced to map (or something like that) their krb5.conf to the actions container.
d
ahh, so if I get it right with the first option datahub cli would run the kinit and not the user, right?
plus1 1
l
In such cases the best way is to add support keytab for CLI (no kinit password required). BTW. I'm not sure it's worthy to add keytab support to UI ingesting, it's better to pass user's kerberos ticket, but it's much harder to implement.
btw, both methods (a) detect already established kerberos session, b) use keytab) is a typical workflow, but keytab is much more suitable for automation.
m
That is right @dazzling-judge-80093, or at least a way to be able to make all the process in the UI and not have to be looking at 2 places
d
Is it a standard thing tool running kinit instead of some kind of central service running it and all the apps using the generated token?
l
It depends. For instance. samba server uses keytab by itself for checking users kerberos tickets. It's like a system service that uses keytab by its own.
Your Spark application uses keytab for running tasks on kerberized HDFS storage.
I, as a user, use kinit to access samba/nfs/hdfs, access some kerberized frontends (HDFS web ui, for instance) etc.
I recommend you guys to read this book. It's old, but Kerberos is old too.