Hello With v0.8.41 we are still getting intermitt...
# troubleshoot
r
Hello With v0.8.41 we are still getting intermittent 401 errors. PFA error screenshot. Can someone point me to the PR that has fixed this issue.
s
Should be https://github.com/datahub-project/datahub/pull/5405. Please note the author @incalculable-ocean-74010 is out currently so detailed debugging will have to wait a bit unless you want to do it on your end. Given this was an intermittent error we are not 100% sure but reports from other people using this fix has been positive so far. As far as we know this is the first report of HTTP 401. Are you sure all the containers got updated to
v0.8.41
? Are you using OSS DataHub or have you made any changes?
i
Hello Abshishek, Are you running multiple GMS instances? How often do you get these intermittent errors?
Are all gms instances configured the same way? With the same salts and encryption keys?
r
Hi Pedro In the last 24HRS 37 requests have failed with this error. Yes this datahub deployment is running on swarm and all the gms instances read from the same env_file. Also we have not configured any salt, encryption keys. So these run default. Is it mandatory to configure salts and encryption keys. Also just now i ran the ingest.py perf test with 200users/118006 requests, none of the request failed with Authorization issue. I am not sure why the issue occurs randomly. Still investigating.
i
I have a suspicion that some of your gms instances are configured with different salts and encryption keys. If none are specified, random ones are generated. This means that a token generated in one gms instance won't be verifiable by another. Could you confirm those parameters on each gms instance to ensure they are consistent throughout?
We use a single k8s secrets in our helm charts to ensure consistency on all gms instances one might want to provision
r
Where do i check these parameters as none is specified in the env_file. Do you want me to add the parameter and then check ?
i
Check the application.yml file that should be in the gms folder under the conf folder
Since this is a deployment using swarm I am not sure how gms is configured to run.
If
printenv
does not output datahub variables and the application.yml file does not specify them then the deployment will most likely be flaky on a multi-gms instance setup.
There are certain consistencies we require on environment variables on gms to have a valid multi-node deployment.
We ensure these consistencies using our helm charts. I understand k8s might not be an option for you but this is a scenario where we can't realistically provide setups for all deployment technologies/orchestrators
Did you check the envs on all gms instances?
b
+1 - I think this is either related to container inconsistency in version or environment variable issue
r
ENV on all gms instances does not have this variable SECRET_SERVICE_ENCRYPTION_KEY. Also i could not find any application.yaml/yml in the gms containers. Maybe the application.yaml gets embedded in the war file war.war. Also are will there be any side effects if i set SECRET_SERVICE_ENCRYPTION_KEY? Also what other parameters are required to have a valid multi-node environment? On swarm gms is deployed as a services and all the services refer the same env file.
i
Maybe the application.yaml gets embedded in the war file war.war.
Yes, it does. Out of the top of my mind, you need the following to be consistent across GMS instances: • DATAHUB_TOKEN_SERVICE_SIGNING_KEY • METADATA_SERVICE_AUTH_ENABLED • DATAHUB_SYSTEM_CLIENT_ID • DATAHUB_SYSTEM_CLIENT_SECRET • DATAHUB_TOKEN_SERVICE_SALT
r
Thanks for the inputs. I will add these to the deployment and share an update by thursday.