https://www.runatlantis.io/ logo
Title
j

Josh D'Agostino

04/06/2023, 2:23 PM
Hi there! We are transitioning Atlantis from EKS to GKE and for some reason we are getting 400s on the liveness and readiness probes when we deploy the application. I'm attempting to debug and looking for docs on where container logs may be stored so I can exec and check. Also curious if anybody else has encountered this issue as google searches have returned no helpful information. Thanks
The container starts, I can exec to it. I can curl it. When I try to reach it via browser I also receive a 502.
d

Dylan Page

04/06/2023, 2:26 PM
so you are getting 400s and 502s? It's not immediately clear what the problem is? Are the liveness and readiness probes still failing?
j

Josh D'Agostino

04/06/2023, 2:27 PM
The probes are failing yes. I'm using the /healthz endpoint with a 5 second timeout.
Lets leave the 502 aside for now
I shouldn't have brought that up lol
๐Ÿ˜‚ 1
d

Dylan Page

04/06/2023, 2:28 PM
Okay, what do the logs say?
kubectl logs
j

Josh D'Agostino

04/06/2023, 2:30 PM
{"level":"warn","ts":"2023-04-06T14:24:07.763Z","caller":"scheduled/executor_service.go:62","msg":"Received interrupt. Attempting to Shut down scheduled executor service","json":{},"stacktrace":"<http://github.com/runatlantis/atlantis/server/scheduled.(*ExecutorService).Run|github.com/runatlantis/atlantis/server/scheduled.(*ExecutorService).Run>\n\tgithub.com/runat
lantis/atlantis/server/scheduled/executor_service.go:62"}
{"level":"warn","ts":"2023-04-06T14:24:07.763Z","caller":"scheduled/executor_service.go:89","msg":"Received interrupt, cancelling job","json":{},"stacktrace":"<http://github.com/runatlantis/atlantis/server/scheduled.(*ExecutorService).runScheduledJob.func1|github.com/runatlantis/atlantis/server/scheduled.(*ExecutorService).runScheduledJob.func1>\n\tgithub.com/runatlantis/atlantis/se
rver/scheduled/executor_service.go:89"}
{"level":"warn","ts":"2023-04-06T14:24:07.763Z","caller":"scheduled/executor_service.go:67","msg":"All jobs completed, exiting.","json":{},"stacktrace":"<http://github.com/runatlantis/atlantis/server/scheduled.(*ExecutorService).Run|github.com/runatlantis/atlantis/server/scheduled.(*ExecutorService).Run>\n\tgithub.com/runatlantis/atlantis/server/scheduled/executor_
service.go:67"}
{"level":"warn","ts":"2023-04-06T14:24:07.763Z","caller":"server/server.go:970","msg":"Received interrupt. Waiting for in-progress operations to complete","json":{},"stacktrace":"<http://github.com/runatlantis/atlantis/server.(*Server).Start|github.com/runatlantis/atlantis/server.(*Server).Start>\n\tgithub.com/runatlantis/atlantis/server/server.go:
970\<http://ngithub.com/runatlantis/atlantis/cmd.(*ServerCmd).run|ngithub.com/runatlantis/atlantis/cmd.(*ServerCmd).run>\n\tgithub.com/runatlantis/atlantis/cmd/server.go:764\ngithub.com/runatlantis/atlantis/cmd.(*ServerCmd).Init.func2\n\tgithub.com/runatlantis/atlantis/cmd/server.go:640\ngithub.com/runatlantis/atlantis/cmd.(*ServerCmd).withErrPrin
t.func1\n\<http://tgithub.com/runatlantis/atlantis/cmd/server.go:1108|tgithub.com/runatlantis/atlantis/cmd/server.go:1108>\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execu
te\n\<http://tgithub.com/spf13/cobra@v1.6.1/command.go:968|tgithub.com/spf13/cobra@v1.6.1/command.go:968>\ngithub.com/runatlantis/atlantis/cmd.Execute\n\tgithub.com/runatlantis/atlantis/cmd/root.go:30\nmain.main\n\tgithub.com/runatlantis/atlantis/main.go:66\nruntime.main\n\truntime/proc.go:250"}
{"level":"info","ts":"2023-04-06T14:24:07.763Z","caller":"server/server.go:996","msg":"All in-progress operations complete, shutting down","json":{}}
Stream closed EOF for atlantis/atlantis-0 (atlantis)
messy
Basically the service is starting and then receiving an interrupt and dying
Here let me edit that and make it more readable
โœ… 1
d

Dylan Page

04/06/2023, 2:56 PM
No worries, my suggestion is to maybe tweak the checks initialDelay to give Atlantis time to come up before itโ€™s killed by the liveness check
j

Josh D'Agostino

04/06/2023, 3:00 PM
Yea sure no problem - we have argocd deployed and the liveness/readiness timeout there is 1s with no issue. The liveness/readiness timeout in the runatlantis docs is 5s. Can you recommend a new threshold or just crank it to 11?
d

Dylan Page

04/06/2023, 3:04 PM
Just crank it for now
Itโ€™s sound like the liveness check is killing the container too aggressively
Might be how GKE is vs EKS, who knows
j

Josh D'Agostino

04/06/2023, 3:06 PM
Not I, said the fly
This was just the sort of advice I was hoping for - the kind that confirms my suspicions. Thank you
d

Dylan Page

04/06/2023, 3:10 PM
Not a problem, keep me posted
j

Josh D'Agostino

04/06/2023, 3:30 PM
Careful what you wish for
๐Ÿ˜‚ 1
๐Ÿ™‚
No dice
Cranked initial delay to 120s and its still failing
d

Dylan Page

04/10/2023, 2:22 PM
Interesting, what are the logs like now that the liveness probe isnโ€™t killing the container
The last log batch was showing Atlantis shutting down
j

Josh D'Agostino

04/10/2023, 3:40 PM
I think this might be an issue with GCP load balancer config as ingress
d

Dylan Page

04/10/2023, 3:49 PM
It might explain the differences since its a GCP specific part
j

Josh D'Agostino

04/10/2023, 3:55 PM
yea
c

Chastity Blackwell

04/11/2023, 1:06 PM
This is the sort of stuff that makes me prefer running Atlantis standalone ๐Ÿ™‚
j

Josh D'Agostino

04/11/2023, 1:17 PM
Oh its so fun tho /s
Ok so the resolution was dumb - there was a tls secret in our manifest that was constantly redirecting the http liveness/readiness checks because it was ancillary. Dropped it and everything was fine. NOw I need to build the ingress and service using google CLB
๐Ÿ’ฏ 1
๐ŸŽ‰
d

Dylan Page

04/11/2023, 1:19 PM
Great to hear that!