Hi team I have upgraded the datahub to 0.10.4 vers...
# all-things-deployment
c
Hi team I have upgraded the datahub to 0.10.4 version but system upgrade job is failing at buildindicesstep. Error logs : failed to reindex containerindex_v2 exception- type: illegal argument exception, Reason = delimiter must be a one char value
1
👀 1
f
Saw same issue recently. Waiting for any insights.
c
Attached logs : One which is failing and one which is passing
a
Can you please share the version and distribution of Elasticsearch used? It appears to be related to not having UTF8 support for the delimiter character in the browsepathv2 which is a special separator character. The logs indicate some ascii character encoding is being used instead of utf8, i.e.
delimiter":"���"
c
Actually we are building are on images Elastic search we are using :7.17.3 version
Both logs are from datahub 0.10.4 version but the one that is failing is based on our upgrade image tha we are building but there is no such difference between our upgrade image or the one pulling from docker
@brainy-tent-14503 any update I am stuck because of this issue
How can we resolve this one
a
Make sure that when building the image your platform default character set is utf-8, the steps to set the character set depend on your specific build system. One way to check is
Copy code
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
c
We are using kaniko executor for building the images and we have changed the character set for this platform to point to UTF. Even after the deploying that image built on that platform showing the error
Attached logs :
What the value of your locale platform ?
And one thing updating the locale value will solve this issue ?
The current locale value of our platform is
And this how we are setting up locale in our docker image of upgrade
Any help on this @brainy-tent-14503 @orange-night-91387
a
There are 3 components that I can think of and one or more of them are not using utf-8. The first is the build system, this is the system that is running the gradle command to compile and build the jar and place it into a docker container. The other 2 components are the docker container itself where the jar is ultimately used and then elasticsearch. One of these systems is not using utf-8. If you’re using like Opensearch in AWS, then it is utf-8 by default. Our docker containers used in helm and docker-compose are also utf8, so I initially eliminated those. Our base containers are all utf-8 by default as well so I eliminated that. This leaves the build system, however it is possible that your environment is different and one of the other components listed above is the actual issue. The root cause of the error is still the same conclusion, if everything is built on and running utf8 then the delimiter is a single character. If the bytes are interpreted as ascii then it would appear as more then one character. Therefore I would check the other components that I didn’t initially consider in your system.
c
One thing in case of elasticsearch it will cover both pre components elasticsearch chart as well elasticsearch setup job?
But with 0.10.2 upgrade we never face this issue What change from 0.10.2 to 0.10.4
b
v0.10.4 introduces logic for a new search and browse experience which creates an aspect called browsePathV2 which includes the utf8 delimiter.
You can see this on demo, see the left nav and top filters
c
Okay understood
Can you share the PR where you have added the support of utf8 in browsepathv2 aspect ?
a
Here is the PR that added the delimiter - https://github.com/datahub-project/datahub/pull/7898
c
BROWSE_PATH_V2_DELIMITER = "␟"; The value that we are passing to this variable is strong which ascii. But you said this new aspect delimiter is utf8?
We have deployed 0.10.4 version but I cannot see this option on UI navigate
a
The new experience is in progress, you can enable it via helm here, we are planning to turn this on by default likely in the next release.
c
Thanks alot and also our issue is resolved after updating locale value for our build system
We are using Java builder to run the gradle command and compile into Jar file
And it was not supporting utf8
f
Hi @brainy-tent-14503. I’m running into the same trouble. I failed the first time without setting the locale on the build machine. But then set it and still the rollout keeps failing with the same “delimiter more than one character” error. Do we know if we should remove all existing indices then retry? Thanks!
Hi @creamy-van-28626. You mentioned you also added the locale setting via the ENV lines in the Dockerfile of datahub-upgrade. Do you think it is necessary even after you fixed the build system locale?
a
Indices do not need to be recreated
c
No if you have set the locale value while building the image no need to add in docker file
f
Big thanks to both of you! I finally realised I needed to install the locales package on our build machine first and then set the locale:
Copy code
apt-get -y install locales
        locale-gen en_US.UTF-8
        export LANG=en_US.UTF-8
        export LANGUAGE=en_US.UTF-8
        export LC_ALL=en_US.UTF-8
Otherwise setting the locale via
LANG
fails and the environment falls back to the “C” locales. (We use Google Cloud Build and the built-in docker builder as the environment for Gradle build.)
Finally the datahub-upgrade job finished its run successfully doge