Intermittent Authetication errors Hi we run ingestions as a DataHub #troubleshoot

Intermittent Authetication errors: Hi, we run ing...

most-nightfall-36645

07/11/2022, 12:20 PM

Intermittent Authetication errors: Hi, we run ingestions as a part of our CI pipelines using datahub rest api, when ingesting we receive intermittent authentication errors for example our client reports:

Copy code

requests.exceptions.JSONDecodeError: [Errno Expecting value] <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 401 Unauthorized to perform this action.</title>
</head>
<body><h2>HTTP ERROR 401 Unauthorized to perform this action.</h2>
<table>
<tr><th>URI:</th><td>/entities</td></tr>
<tr><th>STATUS:</th><td>401</td></tr>
<tr><th>MESSAGE:</th><td>Unauthorized to perform this action.</td></tr>
<tr><th>SERVLET:</th><td>restliRequestHandler</td></tr>
</table>
<hr/><a href="<https://eclipse.org/jetty>">Powered by Jetty:// 9.4.46.v20220331</a><hr/>
</body>
</html>

The server reports a missing authentication token:

Copy code

10:36:57.676 [qtp1830908236-57260] WARN c.d.a.a.AuthenticatorChain:70 - Authentication chain failed to resolve a valid authentication. Errors: [(com.datahub.authentication.authenticator.DataHubSystemAuthenticator,Failed to authenticate inbound request: Authorization header is missing 'Basic' prefix.), (com.datahub.authentication.authenticator.DataHubTokenAuthenticator,Failed to authenticate inbound request: Unable to verify the provided token.)]

This behaviour happens intermittently, where some jobs suceed and other fail, we havent changed our client or token between jobs, so I dont understand why the token is missing. We host our deployment using EKS, we use MySQL as our datastore. I have checked: • RDS database connections and system resources • Kafka system resource • ES system resources None of these are under contention. I also check the node where the frontend and gms containers are running, both have plenty of free memory and cpu time. I am wondering if this could be a bug, does anyone have any suggestions?

most-nightfall-36645

07/11/2022, 12:20 PM

We were using datahub

v0.8.38

but have upgrade to

v0.8.40

most-nightfall-36645

07/11/2022, 12:22 PM

The problem persists in

v0.8.40

incalculable-ocean-74010

07/11/2022, 3:32 PM

Dominic, Is the token valid or has it expired? The error you shown is not necessarily an error. It just means that one of the authenticators in our chain failed. Usually that means that a given request was not made by the

datahub

system user. We are looking into improving these auth errors. cc @big-carpet-38439

incalculable-ocean-74010

07/11/2022, 3:33 PM

Could you share the recipes you run? Do you have metadata service authentication enabled?

most-nightfall-36645

07/11/2022, 4:20 PM

Hi • GMS Authentication is enabled: ◦ METADATA_SERVICE_AUTH_ENABLED = true for both gms and frontend containers • The token is valid • The token is set (I assert its value) A pipeline recipe would look like:

Copy code

config_dict = {
        "source": {
            "type": source,
            "config": config,
            **extra_config,
        },
        "sink": {
            "type": "datahub-rest",
            "config": {
                "server": settings.DATAHUB_GMS_API_HOST,
                "token": gms_token,
            },
        },
    }

The pipeline is validated:

Copy code

try:
        pipeline = Pipeline.create(config_dict=config_dict)
    except ValidationError as e:
        click.echo(e, err=True)
        raise
    pipeline.run()
    pipeline.raise_from_status(raise_warnings=strict_warnings)
    return pipeline.pretty_print_summary(warnings_as_failure=strict_warnings)

big-carpet-38439

07/11/2022, 4:23 PM

Thank you! We’ve heard similar reports and are looking into this with urgency

plus1 1

incalculable-ocean-74010

07/11/2022, 4:23 PM

I’m actively trying to reproduce this locally as we speak

plus1 1

most-nightfall-36645

07/12/2022, 3:06 PM

We replace our token and reran our CI, during the sample pipeline several ingests succeed before failing. All using the same token.

👍 1

incalculable-ocean-74010

07/12/2022, 7:55 PM

Hello Dominic, Out of curiosity have you recorded when those unauthorized errors happen? Do they happen to occur in 5m intervals (inconsistently)?

most-nightfall-36645

07/13/2022, 8:20 AM

Hi Its hard to say, initially we were able to run two successful ingests, these ran within one minute of one another. A third ingest ran about 2 minutes later and failed. We are using a token which should last 3 months.

incalculable-ocean-74010

07/13/2022, 8:45 AM

How long did these ingest processes take?

most-nightfall-36645

07/13/2022, 8:48 AM

around 9 minutes (1:56 + 7:14 = 9:10) then a third job failed.

most-nightfall-36645

07/13/2022, 8:48 AM

same errors.

rough-spring-71950

10/18/2023, 4:33 PM

Curious if this got resolved @most-nightfall-36645

6 Views

Open in Slack

Previous Next