Upgraded one of our production instances to 5 3 9 141 over t cfml #lucee

Upgraded one of our production instances to 5.3.9....

Dean

07/04/2022, 6:08 AM

Upgraded one of our production instances to 5.3.9.141 over the weekend. We've had it in nonprod for while with no issues. The upgraded prod instance is showing very long request times, not for every request, but there seems to be a pattern when it gets hit by a lot of requests that the active requests will spike up, and profiling the requests shows something like the image below. I ramped up FusionReactors profiling settings to gather that and that seemed to make the problem much worse. CPU and heap are fine over the same period, so its not a resource issue. maxRequests is set to 200, so not hitting a limit there. The lock its waiting on 0x536669ad always seem to be the same. When I click on that, I get the last image. Any ideas how I could further troubleshoot before I take production back to 5.3.8? I have three other production servers using that and they are all still running fine and not displaying the same issue

Dean

07/04/2022, 6:36 AM

Maybe a coincidence, but I just updated the jre from 11.0.12.7.1 to 11.0.15.9.1 (corretto) and the problem seems to have gone. Im not convinced that its not just the service restart that has resolved that for the short term. Will post back if I see it kicking off again.

Dean

07/04/2022, 6:38 AM

Nope, literally seconds after I posted that message

gavinbaumanis

07/05/2022, 3:26 AM

I am positive you will have most likely done all this, already - but sometime a prompt from someone else helps you to realise you haven't checked everything, afterall. (I think I'm channelling Charlie!) Anyway; And I realise that there may be URL / datasource / etc differences... But what about these? • Exact same code? • Exact same dependency versions? • Exact same Tomcat Config / heap sizes / Garbage Collection directives / etc. • Exact same version of Tomcat? ◦ Is there an update for Tomcat that you can try? • Is there an OS version / patch level difference? • If running on Linux do you have swap defined? ◦ Is it enabled? / Is it the same size? • Anti-virus differences • Firewall differences? • Proxy Servers? • Exact same CF configs? ◦ If you use commandbox you can install commandbox-cfconfig and use its config DIFF feature. And if all that fails - have you tried turning it off and then on again? Seriously.... I was struggling with an issue for quite a long while and just could not work it out. I was getting the wrong number of entities returned from ORM queries. Some path mappings were not working... I would have sworn I tried "everything", multiple times.... The I just started clean. New OS install (matching the Major version number - then an update.) New Java (v11) install New Lucee install (same version as used elsewhere) New git clone(s) / pull(s) Etc... Absolutely no idea what I did differently THIS time.... and it bugs me (a lot) - that I don't have an answer as to what was wrong... But.... at the end of the day, starting from scratch, being methodical for the entire process, got it all working. Best of luck! Gavin.

Dean

07/05/2022, 4:28 AM

I going to revert it back to 5.3.8.201 tonight and see if it keep happening. There are three differences, the jvm version, the commandbox version and the lucee version. Everything else is identical across 5 servers 🤷

Dean

07/07/2022, 4:33 AM

Back on 5.3.8.201 and the same server is performing optimally.

2 Views

Open in Slack

Previous Next