Hello I am trying to update datahub from 0 9 5 to 0 10 0 I r DataHub #troubleshoot

Hello, I am trying to update datahub from 0.9.5 to...

agreeable-belgium-70840

03/09/2023, 9:35 AM

Hello, I am trying to update datahub from 0.9.5 to 0.10.0. I ran the system upgrade job, and now GMS is giving me this error:

Copy code

2023-03-09 09:29:44,122 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1
2023-03-09 09:30:23,729 [R2 Nio Event Loop-1-1] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:829)

Any ideas?

✅ 1

plus1 7

gentle-camera-33498

03/09/2023, 4:27 PM

What kind of deployment is it? Docker-compose within a VM?

agreeable-belgium-70840

03/09/2023, 4:27 PM

nope. kubernetes with helm

gentle-camera-33498

03/09/2023, 4:29 PM

Weird. Did you just change the global version variable or you made any other changes to values?

agreeable-belgium-70840

03/09/2023, 4:29 PM

hm.. no

agreeable-belgium-70840

03/09/2023, 4:30 PM

i also tried with the dns name of the service

agreeable-belgium-70840

03/09/2023, 4:30 PM

still the same

gentle-camera-33498

03/09/2023, 4:31 PM

Did you update your chart as well?

agreeable-belgium-70840

03/09/2023, 4:31 PM

i updated the env variable in datahub-gms deployment.yaml

agreeable-belgium-70840

03/09/2023, 4:31 PM

i verified in the logs that it picked the value

gentle-camera-33498

03/09/2023, 4:34 PM

I'm not sure but I think the latest version has some changes not only to env variables, but another batch job called systemUpdate there is necessary to reindex your elasticsearch indexes.

agreeable-belgium-70840

03/09/2023, 4:34 PM

i did that

gentle-camera-33498

03/09/2023, 4:35 PM

Anyway, this is weird, because appears that GMS is trying to create a connection with himself. In my config, GMS uses port 8080.

agreeable-belgium-70840

03/09/2023, 4:36 PM

it is also trying 8080

gentle-camera-33498

03/09/2023, 4:38 PM

I tried to help, but I'm out of ideas about what could be

agreeable-belgium-70840

03/09/2023, 4:40 PM

ok thanks

astonishing-answer-96712

03/09/2023, 8:17 PM

Hi this might have to do with a regression in a recent version we’re currently looking into/fixing- keep an eye on #announcements for more info

big-carpet-38439

03/09/2023, 10:04 PM

Hi @agreeable-belgium-70840! It seems you need to run the

datahub-upgrade

container. It will perform an upgrade on your system that will allow the rest of the system components to update!

agreeable-belgium-70840

03/13/2023, 3:32 PM

@big-carpet-38439 i've done that already

agreeable-belgium-70840

03/13/2023, 3:33 PM

actually I wiped out all the data

agreeable-belgium-70840

03/13/2023, 3:33 PM

I am using the current helm chart from datahub

agreeable-belgium-70840

03/13/2023, 3:33 PM

ran the init jobs

agreeable-belgium-70840

03/13/2023, 3:33 PM

and I am still getting the same

agreeable-belgium-70840

03/13/2023, 3:33 PM

I start to run out of ideas now

astonishing-answer-96712

03/13/2023, 5:57 PM

Hi Yianni- what helm charts are you using, and do you have any modifications in your deploy?

agreeable-belgium-70840

03/14/2023, 9:02 AM

I am using the helm charts from here: https://github.com/acryldata/datahub-helm Nope, no modifications...

aloof-laptop-71927

03/15/2023, 5:43 PM

@astonishing-answer-96712 @agreeable-belgium-70840 - I’m actually facing the same issue

Copy code

2023-03-15 17:33:10,411 [R2 Nio Event Loop-1-1] WARN  c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080

This is what I did: 1 - Ran elasticsearch-setup 2 - Ran kafka-setup 3 - Ran datahub-upgrade -u SystemUpdate 4 - Error when trying to start datahub-gms PS: I have a self-hosted ES and Kafka. Any ideas? Thanks!

gentle-camera-33498

03/15/2023, 5:46 PM

Can you share the pod details? The GMS_HOST environment variable may be wrong.

aloof-laptop-71927

03/15/2023, 5:49 PM

@gentle-camera-33498 I’m trying to run GMS as a Docker container. I can’t see a variable called GMS_HOST.

gentle-camera-33498

03/15/2023, 5:52 PM

The GMS_HOST is an environment variable used by other services to retrieve the host and port where the GMS server is hosted (generally in format <host>:<port>)

aloof-laptop-71927

03/15/2023, 5:53 PM

Yes, but I’m getting the connection refused from GMS, not other services. That’s the weird part.

aloof-laptop-71927

03/15/2023, 5:54 PM

It seems like GMS is trying to reach out to itself before the startup process is complete.

gentle-camera-33498

03/15/2023, 5:55 PM

Ok, so that are the logs from the GMS's container, right? If so, It's really weird.

aloof-laptop-71927

03/15/2023, 5:56 PM

Yes.

gentle-camera-33498

03/15/2023, 6:02 PM

Anyway, could you share the GMS container environment variables? Just to have a look.

aloof-laptop-71927

03/15/2023, 8:22 PM

Sure, I can share.

careful-garden-46928

03/17/2023, 3:20 PM

I am having the same issue in DataHub

v0.10.0

In my case I was running an ingestion and the kafka broker disk went 100%. After increasing the disk the GMS container keep failing and restarting with this error.

aloof-gpu-11378

03/17/2023, 5:30 PM

Hi @careful-garden-46928, could you share your env config on GMS?

careful-garden-46928

03/20/2023, 9:48 AM

Sure @astonishing-answer-96712

Copy code

ENABLE_PROMETHEUS: 'true',
                    DATAHUB_SERVER_TYPE: 'quickstart',
                    DATAHUB_TELEMETRY_ENABLED: 'true',
                    DATASET_ENABLE_SCSI: 'false',
                    EBEAN_DATASOURCE_HOST: `${rdsHost}`,
                    EBEAN_DATASOURCE_DRIVER: 'com.mysql.jdbc.Driver',
                    KAFKA_SCHEMAREGISTRY_URL: `http://${schemaRegistryHost}:8081`,
                    KAFKA_BOOTSTRAP_SERVER: `${kafkaBootstrapServer}`,
                    EBEAN_DATASOURCE_URL: `${ebeanDataSourceUrl}`,
                    ELASTICSEARCH_PORT: '443',
                    ELASTICSEARCH_USE_SSL: 'true',
                    GRAPH_SERVICE_IMPL: 'elasticsearch',
                    ENTITY_REGISTRY_CONFIG_PATH: '/datahub/datahub-gms/resources/entity-registry.yml',
                    MAE_CONSUMER_ENABLED: 'true',
                    MCE_CONSUMER_ENABLED: 'true',
                    PE_CONSUMER_ENABLED: 'true',
                    UI_INGESTION_ENABLED: 'true',
                    METADATA_SERVICE_AUTH_ENABLED: 'true',

careful-garden-46928

03/20/2023, 10:02 AM

I can share a little bit more in regards of this issue: 1. I tried to cleanup all the topics because I though that too many events to be ingested and still pending would cause the issue. After cleaning all the events the issue persisted. 2. Then I tried to delete the Kafka topics and recreate from scratch 3. Also deleted and recreated the ElasticSearch index (tearing down the AWS OpenSearch cluster and recreated) None of the previous steps worked.

careful-garden-46928

03/20/2023, 10:04 AM

We had this issue before and we could only manage to restore the system by applying our Phoenix Protocol. (Burn everything to the ground and recreating everything and re-ingesting all the data) This happened before in our DEV environment and now it happened to our INT environment. So far we could manage to restore the systems because we have all the setup automated in AWS but this is not a procedure we like to do all the time. It would be good to understand what we are doing wrong to cause this instability.

aloof-gpu-11378

03/20/2023, 7:12 PM

@dazzling-yak-93039 might be able to provide some insight here

dazzling-yak-93039

03/20/2023, 7:30 PM

Could you try clearing the docker cache and re-running it?

docker system prune -a

dazzling-yak-93039

03/20/2023, 7:32 PM

Context: We had a bug recently where the GMS image and the Upgrade image were not the same version, so GMS was always waiting for the Upgrade job to finish, because the versions didn't match. Clearing the docker images should let you download the new images that don't have this issue.

orange-night-91387

03/20/2023, 8:21 PM

Note: this should only be the case for quickstart deployments, if you are specifying a released tag and not

head

something else is going on. If this is an environment with production data then please note that with the v0.10.0 release a reindex will occur and depending on the size of data can take several hours to resolve (this would be tens of millions of documents, large scale deployment). Deleting all your Kafka topics would definitely cause a problem and you would need to re-run kafka-setup and the upgrade job. The upgrade -> GMS communication is done through a kafka message which lets GMS know the upgrade is finished.

careful-garden-46928

03/21/2023, 10:07 AM

@dazzling-yak-93039 I can’t do a

docker system prune

because we are running datahub in AWS ECS. I assume that when the containers restart the containers could / would start in a new hardware on aws side and with a clean volume and system.

careful-garden-46928

03/21/2023, 10:10 AM

@orange-night-91387 We did successfully execute the upgrade job and we double checked the version. So both upgrade container and datahub are running version

0.10.0

I did delete the kafka topics to check if the tons of events in kafka would be causing some issue when the gms container was reinitialising. And afterwards I did execute the kafka-setup so all topics were back. This procedure did not change the errors so we still got the same error

io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080

careful-garden-46928

03/21/2023, 10:13 AM

Could it be that when the gms container starts it tries to pickup the last state it was before crashing and it is trying to continue processing the events pending during the crash? A reminder: In my case the system was working perfectly well after the migration. Once I started an S3 ingestion the system crashed in the middle of the process due to Kafka disk usage. After fixing the kafka storage the GMS start returning this errors. Note: I can see also that other containers are failing to connect to GMS.

careful-garden-46928

03/21/2023, 10:14 AM

Thank you for your time answering questions and helping me debugging the issue 🙂

orange-night-91387

03/21/2023, 4:22 PM

I did delete the kafka topics to check if the tons of events in kafka would be causing some issue when the gms container was reinitialising. And afterwards I did execute the kafka-setup so all topics were back.

Did you re-execute DataHub Upgrade though? Without doing this GMS would not start. Since you completely cleared the topic data, the message GMS would be looking for that got sent from the first Upgrade run would not be there. This is probably your issue since your upgrade was working prior and is a different root cause than what is probably happening to others in this thread.

careful-garden-46928

03/22/2023, 10:58 AM

@orange-night-91387 Indeed I don’t remember executing the DataHub Upgrade deleting the topics. I will check wit my team mates and return here so other could benefit from the debugging.

careful-garden-46928

03/24/2023, 9:04 AM

So indeed we did not execute the upgrade after “cleaning” the kafka topics. We will pay attention to this scenario, if it happens again we will collect more information and try to consider the execution of the upgrade container. I will report again if it worked or not. Thank everybody who contributed to this discussion. 🤘

agreeable-belgium-70840

03/30/2023, 8:57 AM

@careful-garden-46928 did it work? I am still having the same issue...

agreeable-belgium-70840

03/30/2023, 8:57 AM

with v0.10.1 , so I am guessing that I am doing something wrong. In the beginning I was thinking that it is a gms bug

careful-garden-46928

03/30/2023, 2:33 PM

@agreeable-belgium-70840 since it was in our integration environment and we had a deadline we triggered our Phoenix Protocol and we simply destroyed and recreated everything 🙂 I would try: 1. running the datahub setup containers (kafka, ES and the others) to be sure the expected structure is available in the persistence layer 2. running the datahub-upgrade container 3. running the restore indices process to be sure all data in mysql is indexed correctly in ES

agreeable-belgium-70840

03/30/2023, 2:59 PM

actually i made 0.10.1 work, but i had to wipe out all the data

fierce-monkey-46092

04/04/2023, 12:49 PM

Hello guys, I'm exactly facing this issue. I've updated the CLI version to 0.10.1 and after starting the datahub instance, i can not login. when i check datahub health, it says can not connect to datahub-gms (connection refused)

helpful-quill-60747

04/04/2023, 1:27 PM

@fierce-monkey-46092 Hello! I have the same problem, if you find a solution please let me know

fierce-monkey-46092

04/04/2023, 1:28 PM

@agreeable-belgium-70840 hello sir, how did you get 0.10.1 to work? with quickstart or docker-compose yaml?

agreeable-belgium-70840

04/04/2023, 1:29 PM

i am using kubernetes. but now I am facing another issue. it is using TLSv1.3 for kafka connection, I can't change it via the env variables, and the connection to kafka is timing out

wonderful-wall-76801

04/06/2023, 1:35 PM

Hello everyone! i'm facing with this issue too =( Anybody know how to fix that?

bland-lighter-26751

04/06/2023, 3:41 PM

I'm facing this too today. Currently running datahub upgrade after doing a system prune. Using Elastic Cloud, Cloud SQL and then I have Kafka locally as part of the docker compose.

bland-lighter-26751

04/06/2023, 3:44 PM

That seemed to fix it @wonderful-wall-76801

wonderful-wall-76801

04/07/2023, 6:32 AM

hmm, you mean docker system prune? i'm working with kubernetes and datahub helm chart

full-dentist-68591

04/14/2023, 10:31 AM

Hi all, I was also facing issues when upgrading from v0.9.6 to v0.10.2. Following the hint in the release notes to perform the command below usually failed due to connection issues with kafka, elastic etc.

Copy code

docker run acryldata/datahub-upgrade:v0.10.0 -u SystemUpdate

I could help myself using the command found in Restoring Search and Graph Indices from Local Database. I simply adapted the command as below. Be aware you might need to change the image version in

datahub-upgrade.sh

and update the used

docker.env

accordingly.

Copy code

./docker/datahub-upgrade/datahub-upgrade.sh -u SystemUpdate

Hope this helps

fierce-monkey-46092

04/17/2023, 8:13 AM

Copy code

./docker/datahub-upgrade/datahub-upgrade.sh -u SystemUpdate

I followed the above command with changing my docker.env file. Script run successfully first time. After i log into front the GMS-version is not upgraded. When i run the script again it's giving me a error. The error: Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED

full-dentist-68591

04/17/2023, 8:52 AM

@fierce-monkey-46092 you need to update the containers first and then execute the command to perform some internal updates

fierce-monkey-46092

04/17/2023, 8:56 AM

@full-dentist-68591 I've searched from documentations and followed the steps, but still not sure how to update the containers first

careful-garden-46928

04/17/2023, 11:34 AM

Had the issue again upgrading from

v0.10.0

v0.10.2

I don’t understand way the GMS is failing with the connection refused error 😕

Copy code

2023-03-15 17:33:10,411 [R2 Nio Event Loop-1-1] WARN  c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080

careful-garden-46928

04/17/2023, 11:43 AM

In my setup I have the containers running on AWS ECS. I have the suspicion when AWS deploy the new TaskDefinition with the new image version, since it runs the new task in concurrency with the current old version DataHub is running 2 GMS containers at some point and I think it breaks something in the database. 😕 I am running some tests to check if this is the case.

fierce-monkey-46092

04/18/2023, 8:07 AM

I've updated all the containers and run the datahub_upgrade.sh successfly. But I'm getting Connection refused: localhost/127.0.0.1:8080 on GMS. what is this haha

astonishing-answer-96712

04/18/2023, 4:23 PM

@brainy-tent-14503 may be able to help here- seems like it’s widespread issue

brainy-tent-14503

04/18/2023, 8:19 PM

The new GMS will not start until the

system-update

job finishes so it will simply wait. The required output from the

system-update

job is like

Copy code

2023-04-17 10:42:24 Executing Step 4/5: DataHubStartupStep...
2023-04-17 10:42:24 2023-04-17 15:42:24.582  INFO 1 --- [           main] c.l.d.u.s.e.steps.DataHubStartupStep     : Initiating startup for version: v0.10.2-0
2023-04-17 10:42:24 Completed Step 4/5: DataHubStartupStep successfully.

When using quickstart this is all handled for you and there is no need to execute the

datahub_upgrade.sh

script. If you are managing docker manually note that the referenced script is not aligning the version necessarily, it points to

head

and you are likely intending to deploy a specific version such as

v0.10.2

careful-garden-46928

04/20/2023, 8:34 AM

I want to contribute here with some information for future debugging purposes: My setup: • Containers running on AWS ECS Fargate • AWS managed services for Kafka, MySQL and ElasticSearch • Updating from

v0.10.0

v0.10.2

(Tried to update to

v0.10.1

before) Error message while bringing up the

v0.10.2

containers:

Copy code

2023-03-15 17:33:10,411 [R2 Nio Event Loop-1-1] WARN  c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
...

After reading the code changes of release

v0.10.2

I noticed some changes would seems to indicate it was necessary to run the command

docker run --rm --env-file docker_upgrade.env acryldata/datahub-upgrade:v0.10.2 -u SystemUpdate

and confirmed by @brainy-tent-14503 message in this thread. After executing successfully the SysUpdate command all errors disappeared and I was able to run

v0.10.2

Conclusion: It seems that it is always required to execute the

SystemUpdate

command before updating to any new version. It wasn’t clear to me and my team that the SystemUpdate command was required to be executed between minor version updates. There is no information regarding this in the release notes, neither in the official online documentation. From version

v0.9.X

v0.10.X

there was a disclaimer informing that SystemUpdate command was required. I would expect also such note in this recent release.

orange-night-91387

04/20/2023, 7:36 PM

Datahub upgrade is a required component and will need to be run for all releases. In some it will be a no-op, but this is the intended set up for any migration executions needed on the backend side going forward. This was called out in the town hall presentation as well as updated in the helm charts and docker compose configuration. We can make this more clear in the documentation and release notes as well going forward.

strong-twilight-1984

04/21/2023, 6:43 AM

@orange-night-91387 , im having the same issues as above. regarding your message linked below: https://datahubspace.slack.com/archives/C029A3M079U/p1679343681184609?thread_ts=1678354501.041149&cid=C029A3M079U Can you please indicate via which kafka topic this update message is sent to GMS? I ran the

-u SystemUpdate

command "succesfully", but i noticed in the logs:

Copy code

INFO - 2023-04-20 13:20:22.761 ERROR 1 --- [main] c.l.m.dao.producer.KafkaHealthChecker    : Failed to emit History Event for entity Event Version: v0.10.2-0
INFO - 
INFO - org.apache.kafka.common.errors.TimeoutException: Topic DataHubUpgradeHistory_v1 not present in metadata after 60000 ms.
INFO - 
INFO - 2023-04-20 13:20:22.762  INFO 1 --- [main] c.l.d.u.s.e.steps.DataHubStartupStep     : Initiating startup for version: v0.10.2-0
INFO - Completed Step 4/5: DataHubStartupStep successfully.

Does this mean that the SystemUpdate message was never posted to the

DataHubUpgradeHistory_v1

topic , and therefor never consumed by GMS?

aloof-gpu-11378

04/22/2023, 12:26 AM

The process that creates the topic is called

kafka-setup

this docker container creates the topics including

DataHubUpgradeHistory_v1

. The helm chart performs these and other setup jobs before the system update. As far as I know there is no way to apply the chart against ECS and there you will have run all the containers manually in the order specified by the helm hooks which also indicate when during an install/update to run and in what order. Examples: 1, 2. More information about this is available in the helm documentation.

strong-twilight-1984

05/04/2023, 6:19 AM

So i have resolved this issue. Because we used customised helm charts, and managed MSK with some rules on creating topics, we are not able to use the kafka-setup container to create topics. had to manually create the topics and add a few additional kafka related (SASL, JAAS) env vars to the upgrade container. once these were added, the upgrade container was able to connect to the

DataHubUpgradeHistory_v1

topic and gms was able to consume the upgrade messages from it. no more localhost connection refused errors.

strong-twilight-1984

05/04/2023, 6:20 AM

not sure why the

SystemUpgrade

step passed if it could not post to the

DataHubUpgradeHistory_v1

topic. surely it should have failed?

early-kitchen-6639

05/17/2023, 6:49 AM

What are the env vars to set for the upgrade container when SSL is enabled on kafka cluster? Facing the same exact issue when I am trying to connect to SSL enabled kafka on the SystemUpdate job

brainy-tent-14503

05/17/2023, 5:08 PM

@early-kitchen-6639 The datahub-upgrade kafka related variables are pulled from the global configuration see here. The full list of spring kafka parameters are documented in the spring docs here and some of them are discussed in the docs here. Those would be configured in this section springKafkaConfigurationOverrides of the helm values.

brainy-tent-14503

06/05/2023, 11:56 PM

Please provide the logs from the datahub system update job which helm runs during the pre-install/upgrade step.

helpful-dream-67192

06/06/2023, 7:25 AM

@brainy-tent-14503 PFA zipped logs of system update job

system-update-logs.txt.zip

brainy-tent-14503

06/06/2023, 8:17 PM

I am looking into the exception in the log.

brainy-tent-14503

06/06/2023, 8:22 PM

Ok, this log is from the post-GMS start job. There should be a pod contains the string

dh-system-update

whereas this log looks like

dh-nocode-migration

logs

brainy-tent-14503

06/06/2023, 8:24 PM

If in fact the logs are from a pod with

system-update

then something is being lost when executing the pod, likely the cli arguments here which will run the system-update logic.

brainy-tent-14503

06/06/2023, 8:25 PM

@helpful-dream-67192 ☝️ Let me know if you would be able to share the pod’s manifest and we can look for the right args.

helpful-dream-67192

06/07/2023, 7:52 AM

@brainy-tent-14503 Thank you so much. System-update was disabled hence it was running and after enabling it, datahub is working properly. Two queries for you, 1. Do we always need to run system-update on every datahub helm upgrade? 2. Do we always need to run setup jobs(elasticsearch, kafka, mysql) on every datahub helm upgrade?

aloof-gpu-11378

06/07/2023, 7:52 PM

1.) Yes, this is required. It will essentially handle reindexing and clean-up of previous backups/cloned indices. In the future it may also perform database migrations and other steps required to update the system prior to GMS starting the new version.

aloof-gpu-11378

06/07/2023, 7:53 PM

2.) For the setup jobs, they are typically only required for initial setup, however it is possible that new topic is added for kafka or some other configuration needs to be applied. Again I would run them to be sure. The helm chart will run the 3 setup jobs first and then the system-update step per the helm chart hooks in the correct order.

cuddly-butcher-39945

06/08/2023, 1:10 PM

Hi @brainy-tent-14503 , I finally got my helm charts to deploy successfully! There were a few things I had to do. 1. datahub-system-update-job.yml (We had issues previously with the system update job not being able to run with a helm pre-install hook, (I had set to a post-install hook) but had to set BACK to a "helm.sh/hook": pre-install,pre-upgrade 2. When the system update job was trying to run I was getting an error about various datahub secrets not being present. I then 1). scripted out the templated yaml for the secrets, 2). kubectl applied the secrets, 3). Restarted the datahub-system-update-job 3. This then allowed the GMS to complete the deployment. 4. I tried adding a "helm.sh/hook": pre-install,pre-upgrade annotation to my secrets originally, but that does not seem to work in helm, hence step 2.2 above. Thanks for all your help!

fierce-orange-10929

07/24/2023, 7:07 PM

is there a particular order between system update/datahub upgrade and various other prerequisites? we ran both the upgrade related pods for going from 0.9.6.1 to 0.10.4 and still failing with the connection refused issue

brainy-tent-14503

07/24/2023, 7:53 PM

Yes, there is an order to the containers which is controlled by helm using the hooks as mentioned earlier in the thread. To give you a whole picture, I’ve thrown together a quick diagram here. Work your way from the bottom to the top depending on your environment. Also feel free to start a new thread with your logs and we can continue the discussion there. @fierce-orange-10929

fierce-orange-10929

07/24/2023, 7:57 PM

thanks @brainy-tent-14503!

671 Views

Open in Slack

Previous Next