Hi. after an upgrade from `v0.8.44` -> `v0.10.0...
# troubleshoot
b
Hi. after an upgrade from
v0.8.44
->
v0.10.0
gms stopped working. Details in 🧵
1
I am running on a k8s cluster without provided helm charts. My process was: 1. Stop frontend, gms. 2. Run
datahub-upgrade
process. Ran successfully with some warnings. 3. Start
frontend
,
gms
.
it seems that no process is listening on port 8080 in the gms container. You can also see errors like
Copy code
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
inside the gms container.
You also get connection refused errors in the frontend container as well.
i see processes are running in the gms container though:
Copy code
$ ps aux | cat
PID   USER     TIME  COMMAND
    1 datahub   0:00 dockerize -wait <http://REDACTED:9200> -wait-http-header Accept: */* -wait <tcp://REDACTED> -wait <tcp://REDACTED> -timeout 240s java -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
   17 datahub   1:36 java -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
I am now considering rolling back to
v0.8.44
or maybe some other version, but I am not sure if that's a good idea with having ran the
datahub-upgrade
process which finished successfully:
Copy code
Completed Step 5/5: CleanUpIndicesStep successfully.
Success! Completed upgrade with id SystemUpdate successfully.
Upgrade SystemUpdate completed with result SUCCEEDED. Exiting...
please let me know your thoughts on that.
What's most peculiar is that I have two nearly identical deployments on different clusters - sandbox for testing things and prod (separate mysql, schema-registry, kafka, es). I've ran the same process for the deployment in sandbox and it worked for ~4 days up until I restarted it again.
Might be a similar issue to the one in this thread.
So I switched to
v0.9.6.1
and everything seems to work then 🤷‍♂️
a
Hi @bumpy-activity-74405, good to hear things are working now- in the future I think #all-things-deployment is a better venue for something like this!