What do you check for with your docker health chec...
# docker
a
What do you check for with your docker health checks? For our app containers, we have been checking it can reach the DB, and the one critical upstream web service (which is in an adjacent container on the same VPS). But our hosting providers said "nah, Docker health check should just check stuff about the container itself; have other alerts - which we do have - for overall system stability". The reason being that when a health check fails, the usual thing is to restart the container... and an upstream service not part of the container is not ever gonna be fixed by that. I kinda see the sense there. But thought I'd check elsewhere for other opinions too.
t
we have readiness checks for K8s but other than that the general health check is
getPageContext().getDataSourceManager().releaseConnection()
for each data base. It tests lucee is running and able to talk to the db, but we do not test other services like SOLR or our API as they have there own health checks
p
Same as you, DBs and any primary services
☝️ 2
q
Since 90% of my apps use ORM, I just init the ORM in my docker healthcheck. If the DB isn't connecting then we should be marking the container as down anyway. Proper dependancy mapping at the Docker/Container level should have the DB up first before worrying about the health of the app server.
j
we have a very slow app init (in a big mura site with lots of custom plugins). for me, health is when the app is initialized, so i'm
curl
-ing a page (e.g., the home page) and looking for a bit of text among the html content. in other words, i get that whole init process out of the way before the orchestrator allows traffic, so during a rolling deployment with multiple lucee replicas, that yields a seamless user experience. otherwise, every time i did a deployment, i'd have an app init blocking requests to that container. the downside is that if, say, the DB is dead, for whatever reason, the orchestrator will keep bouncing the container and eventually roll back to the previous image. in other words, that wouldn't be the container's fault but the orchestrator would punish it. i think K8s has multiple health check stages (one at start and another that happens throughout the life of the container). we still run lucee on Swarm, though, which, IIRC, only has one type of health check. (we've been migrating services one-by-one to k8s, but lucee's going to be the last one to go.)
i think the takeaway is that it's going to depend on your app, your rollout policy, the orchestrator, and maybe some other factors, so it's not a one-size-fits all thing. we then also have another health check page that's observed by an external monitor, which has some different responsibilities.
1