Hi team I have upgraded the datahub to 0 10 4 version but sy DataHub #all-things-deployment

Hi team I have upgraded the datahub to 0.10.4 vers...

creamy-van-28626

07/19/2023, 7:34 AM

Hi team I have upgraded the datahub to 0.10.4 version but system upgrade job is failing at buildindicesstep. Error logs : failed to reindex containerindex_v2 exception- type: illegal argument exception, Reason = delimiter must be a one char value

✅ 1

👀 1

fierce-guitar-16421

07/19/2023, 10:52 AM

Saw same issue recently. Waiting for any insights.

creamy-van-28626

07/19/2023, 12:38 PM

Attached logs : One which is failing and one which is passing

upgrade_pass.txt upgrade fail.txt

aloof-gpu-11378

07/19/2023, 1:51 PM

Can you please share the version and distribution of Elasticsearch used? It appears to be related to not having UTF8 support for the delimiter character in the browsepathv2 which is a special separator character. The logs indicate some ascii character encoding is being used instead of utf8, i.e.

delimiter":"���"

creamy-van-28626

07/19/2023, 2:29 PM

Actually we are building are on images Elastic search we are using :7.17.3 version

creamy-van-28626

07/19/2023, 2:52 PM

Both logs are from datahub 0.10.4 version but the one that is failing is based on our upgrade image tha we are building but there is no such difference between our upgrade image or the one pulling from docker

creamy-van-28626

07/19/2023, 5:47 PM

@brainy-tent-14503 any update I am stuck because of this issue

creamy-van-28626

07/19/2023, 5:47 PM

How can we resolve this one

aloof-gpu-11378

07/19/2023, 9:25 PM

Make sure that when building the image your platform default character set is utf-8, the steps to set the character set depend on your specific build system. One way to check is

Copy code

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

creamy-van-28626

07/21/2023, 7:18 AM

We are using kaniko executor for building the images and we have changed the character set for this platform to point to UTF. Even after the deploying that image built on that platform showing the error

creamy-van-28626

07/21/2023, 7:22 AM

Attached logs :

upgrade_faillogs.txt

creamy-van-28626

07/21/2023, 8:27 AM

What the value of your locale platform ?

creamy-van-28626

07/21/2023, 12:15 PM

And one thing updating the locale value will solve this issue ?

creamy-van-28626

07/21/2023, 12:16 PM

The current locale value of our platform is

creamy-van-28626

07/21/2023, 2:08 PM

And this how we are setting up locale in our docker image of upgrade

creamy-van-28626

07/21/2023, 2:10 PM

Any help on this @brainy-tent-14503 @orange-night-91387

aloof-gpu-11378

07/22/2023, 1:30 PM

There are 3 components that I can think of and one or more of them are not using utf-8. The first is the build system, this is the system that is running the gradle command to compile and build the jar and place it into a docker container. The other 2 components are the docker container itself where the jar is ultimately used and then elasticsearch. One of these systems is not using utf-8. If you’re using like Opensearch in AWS, then it is utf-8 by default. Our docker containers used in helm and docker-compose are also utf8, so I initially eliminated those. Our base containers are all utf-8 by default as well so I eliminated that. This leaves the build system, however it is possible that your environment is different and one of the other components listed above is the actual issue. The root cause of the error is still the same conclusion, if everything is built on and running utf8 then the delimiter is a single character. If the bytes are interpreted as ascii then it would appear as more then one character. Therefore I would check the other components that I didn’t initially consider in your system.

creamy-van-28626

07/24/2023, 7:16 AM

One thing in case of elasticsearch it will cover both pre components elasticsearch chart as well elasticsearch setup job?

creamy-van-28626

07/24/2023, 8:38 AM

But with 0.10.2 upgrade we never face this issue What change from 0.10.2 to 0.10.4

brainy-tent-14503

07/24/2023, 5:15 PM

v0.10.4 introduces logic for a new search and browse experience which creates an aspect called browsePathV2 which includes the utf8 delimiter.

brainy-tent-14503

07/24/2023, 5:17 PM

You can see this on demo, see the left nav and top filters

creamy-van-28626

07/24/2023, 8:01 PM

Okay understood

creamy-van-28626

07/26/2023, 1:38 PM

Can you share the PR where you have added the support of utf8 in browsepathv2 aspect ?

aloof-gpu-11378

07/26/2023, 6:25 PM

Here is the PR that added the delimiter - https://github.com/datahub-project/datahub/pull/7898

creamy-van-28626

07/26/2023, 6:51 PM

BROWSE_PATH_V2_DELIMITER = "␟"; The value that we are passing to this variable is strong which ascii. But you said this new aspect delimiter is utf8?

creamy-van-28626

07/27/2023, 8:18 AM

We have deployed 0.10.4 version but I cannot see this option on UI navigate

aloof-gpu-11378

07/27/2023, 4:11 PM

The new experience is in progress, you can enable it via helm here, we are planning to turn this on by default likely in the next release.

creamy-van-28626

07/27/2023, 6:23 PM

Thanks alot and also our issue is resolved after updating locale value for our build system

creamy-van-28626

07/27/2023, 6:24 PM

We are using Java builder to run the gradle command and compile into Jar file

creamy-van-28626

07/27/2023, 6:25 PM

And it was not supporting utf8

fierce-guitar-16421

09/05/2023, 3:38 PM

Hi @brainy-tent-14503. I’m running into the same trouble. I failed the first time without setting the locale on the build machine. But then set it and still the rollout keeps failing with the same “delimiter more than one character” error. Do we know if we should remove all existing indices then retry? Thanks!

fierce-guitar-16421

09/05/2023, 3:47 PM

Hi @creamy-van-28626. You mentioned you also added the locale setting via the ENV lines in the Dockerfile of datahub-upgrade. Do you think it is necessary even after you fixed the build system locale?

aloof-gpu-11378

09/05/2023, 7:08 PM

Indices do not need to be recreated

creamy-van-28626

09/05/2023, 7:21 PM

No if you have set the locale value while building the image no need to add in docker file

fierce-guitar-16421

09/05/2023, 8:38 PM

Big thanks to both of you! I finally realised I needed to install the locales package on our build machine first and then set the locale:

Copy code

apt-get -y install locales
        locale-gen en_US.UTF-8
        export LANG=en_US.UTF-8
        export LANGUAGE=en_US.UTF-8
        export LC_ALL=en_US.UTF-8

Otherwise setting the locale via

LANG

fails and the environment falls back to the “C” locales. (We use Google Cloud Build and the built-in docker builder as the environment for Gradle build.)

fierce-guitar-16421

09/05/2023, 8:39 PM

Finally the datahub-upgrade job finished its run successfully doge

Open in Slack

Previous Next