Question: is this still a valid method to get comm...
# box-products
b
Question: is this still a valid method to get commandbox as a service? https://www.ortussolutions.com/blog/screencast-starting-commandbox-servers-as-a-windows-service
Since we updated commandbox from 5.4.2 to 5.5.2 we experience the following: We have installed a service to start a commandbox coldfusion server (as described in the blog post above). Normally in the taskmanager a "box.exe" process appears, then a "java.exe" process which will take about 700MB of memory, then a second "java.exe" appears which will take about 3GB of memory. That is the amount we configured for our cf-instance. The problem now is that I see the first 2 processes appear, but not the last one. So effectively the site never starts.
box server list
also shows the instance as stopped. The windows service however says it is started... Also, this is not a consistent problem. Once and a while the site will start. I have looked in several log files of commandbox, and in de logfiles of the cfserver I'm trying to start, but I see no weird errors. We are running 2 of those windows commandbox services. Does anyone have a clue as to what is happening?
b
yes, it is still valid. I've found that issuing a RESTART instead of a start almost always succeeds in starting the server behind the service where stop then (any amount of time later) start only works sometimes. I had to add some retry fault tolerance into our deployment pipeline to account for the false starts by starting, waiting a few seconds then spending the next 30 seconds or so trying to hit the healthcheck of the app. If, after 30 seconds, the app still isn't up, a RESTART of the service is ran. I've not had it fail to start beyond that second attempt. I can't find any logs around what could have gone wrong... when it happens, it's just as if the windows service (wrapper) started but never tried to start the actual commandbox server.
The odds were much better when using the commandbox service wrapper instead of NSSM but it did have the same issues. If you're company is willing to spend the $50 per server on that, it's a much better choice from my testing but, with fault tolerance built into the deployment process that has, so far, been 100% successful in getting the server/service back up, I couldn't justify to management that using the paid option for 40+ servers was worth it. ( I REALLY wanted to script the service creation in box.json too! heh)
One other note, the same version of commandbox, same version of lucee on windows server 2016 doesn't have this service wrapper disconnect issue. So far, I've only personally seen it on Server 2019.
b
I can't find any logs around what could have gone wrong... when it happens, it's just as if the windows service (wrapper) started but never tried to start the actual commandbox server.
Exactly the issue we where experiencing. We started evaluating the situation and came to the conclusion that we actually don't need the windows service. We reverted back to
box start serverConfigFile=""
and
box stop serverConfigFile=""
which seem to work fine. Thanks for the inpu
b
to be clear, all of my stop, start, restart mentions above are from powershell commands. stop-service <servicename> start-service <servicename> restart-service <servicename>
yeah, if you can get away with no service, I wouldn't bother on Windows. You'll be better off.
If a lightbulb goes off and you figure out the problem, please do let me know (and I'll do the same)
b
@birdy1980 The key to troubleshooting server start issues with NSSM is to enable the standard out and error logs as they will show you what's going on in the console
And I do recommend the service manager module as it makes this a log easier, though we do charge a few bucks for it.
it's worth noting there have been a few issues I've run into with my clients using Windows services on CommandBox 5.5 due to some changes in how we track server status
We now use a PID file on disk instead of trying to bind to the ports to see if the server is up. We've found there are a few scenarios that will kill java before the PID file can get removed, such as a Windows restart, which makes the server appear as though it's still running the next time you go to start the service.
This can lead to issues like the service hanging forever on the console prompt asking you what you'd like to do since your server is already running. We've solved issues like this in our paid service manager module (by setting an env var that forces CommandBox into non-interactive mode) but you could be hitting something like that. Hard to say without the logs
re-reading your first post, I really think you're probably hitting this since • it does prevent the second java process from starting • it will appear to be random since there are several processes in place that will clean up an orphaned PID file, but it happens asynch so it will fail once and then work the next time.
An easy way to tell 100% if you're hitting the scenario I described is to grab the PID of the first java process (the CLI) and run it through jstack to see what it's doing
Copy code
jstack -l 12345
and look for this in the stack trace
Copy code
at system.util.multiselect_cfc$cf.udfCall3(/commandbox/system/util/MultiSelect.cfc:298)
That's the code that prompts the user when they try to start an already-started service
It's also worth noting, enabling NSSM's logs actually change this behavior so it's a Heisenbug for sure since monitoring it actually changes the behavior!
🤣 1
For whatever reason java reports as not having an interactive console, but only when NSSM's logging is enabled which will make it appear to all be working right up until you turn off the logs, lol
if you're on the a latest CommandBox version, the following env var will turn that off for good
Copy code
box_config_nonInteractiveShell=true
@bhartsfield If your company were to reach out to Ortus, we could probably work a deal for a site license for the service manager module.
b
Thanks, @bdw429s. I'll see if a site licensing deal can get some more interest.
I'm going to do some testing around your comments above to see if I you're right (as usual). FYI @birdy1980,
jcmd
is also great way to quickly get all running java pids and what started them (so you can pass it to jstack)
b
@bdw429s Thanks for the info. We where probably hitting the problem where the PID was not cleaned up, because the time between the stop service, and start service was pretty short. We figured that we don't need the windows service for our specific situation and went with the normal
box server start ..
and stop. Good to know there is a (paid) service manager module.