Hi, my ingestion runs which were set up using the ...
# all-things-deployment
a
Hi, my ingestion runs which were set up using the UI are failing with the following error:
Copy code
Unable to emit metadata to DataHub GMS
401 Client Error: Unauthorized for url: <http://datahub-gms.datahub.local:8080/aspects?action=ingestProposal>
I’m running DataHub v0.8.40 and DataHub Actions v0.0.4. The above log is showing up in the DataHub actions container. I have configured the DataHub Actions container with the client id and secret for GMS as well. It is worth noting that this setup was working previously but I migrated data from one DataHub deployment to another deployment. They both had the same version of containers but the client id / secrets and token generation secrets differed between the two. After the migration, which involved copying the metadata v2 table and ReIndexing, i’m getting ingestion failures. I’ve also tried deleting the ingestion as well as the secrets it references and recreating them with no success. I noticed another warning in the logs:
Copy code
❗Client-Server Incompatible❗ Your client version 0.8.38.2 is older than your server version 0.8.40. Upgrading the cli to 0.8.40 is recommended
I’m guessing the latest release of the actions container doesn’t use the up to date CLI version? Would i need to set the
UI_INGESTION_DEFAULT_CLI_VERSION=0.8.38.2
in GMS?
i
Hello Umair, The issue here is that the new deployment has a different set of configs which are used to encrypt and decrypt secrets. When you migrated the data you also migrated certain metadata (DataHubSecrets) which are encrypted using those hidden admin configs. Now in the new deployment since the configs are different, DataHub is unable to decrypt the secrets used in managed ingestion
The fix for this is to ensure that the new deployment deployment, in particular GMS has the exact same encryption environment variables as the old GMS instance.
a
Are there encrypted values stored in the DB which got transferred over during the migration? I would have thought that deleting the ingestion run + the user defined secrets in the UI would have resolved such an issue
i
Are there encrypted values? Yes, all secret and access token related metadata
Deleting in the UI is a soft delete, it does not remove the entries from the database.
a
Ah I see. Didn’t realize it would have long lived encrypted data which couldn’t be cleared with regeneration of user tokens / user defined secrets. Thanks for the clarification. I’ll try deploying with the old secret values then
@incalculable-ocean-74010, I tried reverting all the secrets but still getting the same error. Would there be a way for me to migrate just the DataSets and their associated tags / glossary terms? It is looking like i may need to do a fresh deployment so there are no discrepancies between the encrypted values. At the moment, the DataHub instance only has DataSets from a single source (Redshift), no usage / profiling info. The datasets have a few user defined tags on them as well. Ideally i’d like to be able to transfer this data to a fresh install
i
You can try to migrate all data from your old DB into the new one and manually delete all rows that are related to DataHubSecret, DataHubAccessToken, DataHubIngestionSource (I think) & InviteToken. Naturally you won’t have these items in the new DataHub installation.
a
Thanks. Just an update for this in case anyone else has a similar issue. It turns out the migration i did was unrelated to the problem. When i configured the ingestion through the UI, i left out the
sink
definition thinking that DataHub Actions container would fill in the appropriate details for GMS. However, although it does fill in the address of GMS, it doesn’t seem to pass along the auth info — I have auth enabled on GMS. I checked the docs and they state to pass along a user generated token, but these tokens have a max lifetime of 3 months, which isn’t ideal. I ended up adding a sink definition which uses the client_id and secret from the datahub actions container to pass along an Authorization header as part of the calls to gms.
Copy code
sink:
    type: datahub-rest
    config:
        server: '<http://datahub-gms.datahub.local:8080>'
        extra_headers:
            Authorization: 'Basic ${DATAHUB_SYSTEM_CLIENT_ID}:${DATAHUB_SYSTEM_CLIENT_SECRET}'
This configuration ends up working. I’m kind of surprised that I need to explicitly add this though. I would think that this would be a default config. Maybe i’m doing something else wrong. cc @incalculable-ocean-74010
i
cc @big-carpet-38439
b
@limited-kilobyte-98369 in recent versions of datahub you should NOT require any sink block at all.
The exact block you posted above will be injected on your behalf
a
Thats strange, i’m using 0.8.42, but without this sink definition, my ingestion run was failing with a 401. Is this injection done before the event reaches the DataHub Actions container or is responsible for it? I’m using 0.0.4 for that
b
Okay should be fine. If I recall correctly its all about teh CLI version thats used
If you use a more recent version it should just work..
You may need to configure the source to use a newer version of DataHub SDK
You can do that in EDIT > Finish Up > Advanced (bottom)
Then CLI version
t
This was awesome answer and question! I had similar issue and was wondering what was going on with my local CLI ingestion. Had to remove sink settings and everything worked.