The CF 2018 production server for one of our big a...
# cfml-general
d
The CF 2018 production server for one of our big apps sometimes starts throwing template not found errors, always missing the same one, when nothing has changed. The template is used by a ton of pages, and it definitely exists. Clearing the template cache fixes the problem, until it comes back some weeks later. Has anyone else seen this? Any ideas for a fix?
e
If its a Windows server, and I bet it is, you need to look at the DISK its on. in NIX you play with your INODE and max files open, usually tomcats service is set to something really low such as 4096. On windows, its usually time to defrag the volume or replace the drive. If you can tell me which OS, I can tell you the better fix, but generically its one or the other.
d
Thanks for jumping in @Evil Ware. It's Windows Server 2019 Datacenter.
e
How much memory? is ACF running via mod_jk or iis? I ask as there are a few configuration settings to think about, and is this sitting behind a stateful WAF ? I ask as either you allocated way too little memory to the application, try setting your minimum and maximum memory to the same value for the JVM so there is less of a hit on performance when GC kicks in. Second I would look at your wait delay, it maybe default which is too long, HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\tcpip\Parameters\ if you do not have it, add dword value TcpTimedWaitDelay, Set this to a low value, such as 10, and scale back to 1 or 2 if you have a low latency network. additionally, how is CF accessing the executable files, is it local, over the network? if its over the network, look at which protocol you are using. If you are using SMB, when ever possible disable SMB 1 and use SMB 3 with authentication pass through. That is where the user from host 1 is the same user and password used on host 2.
d
CF is behind IIS, no WAF on the app side, but there's some processing our network folks have in place. Can you explain what you're thinking may be happening? Again, this medium high traffic app will run fine for weeks, then suddenly lots of requests start throwing this template not found exception. I don't understand how that could be a or GC memory issue, but of course there's lot of the world I don't understand :)
e
Someplace in your application, you are chewing up disk "USE" a lot for a single file thats being opened everywhere. GC takes a hit on disk IO, so if you have less GC threshing you have less of a hit on performance as the disk isnt busy creating and shrinking a file on app use. As for what I think is happening, you have a server that is over utilized as on windows IOPS is rarely an issue in coldfusion. My guess is first reduce the IOS on GC, then reduce the IOPs by looking at the network stack by timing out slow crappy connections. Next I would look at if you have SrvIO enabled, if its hypervisor, the drivers for the NIC, the application pool for IIS, DNS resolution, and finally your cache stack. As I bet like many sites you have junk "bot" connections who leave your application stack half open, then your application chews up resources, opening a file. the file in question is busy as hell, and you only have so much disk resources. So I would look at timing requests out that are not ultra fast, such as HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters TCPMaxDataRetransmissions=7 , the default value is 255, which is insane. I put this as low as 1 on SAN clusters for time-sensitive applications. The value of 7 is what I slap on for most general troubleshooting, and tweak it accordingly. As for your template, move as much of the code as you can to static html, then serve it with a simple caching engine. We have a script that is built upon application start that dumps the core theme ui values to CSS. The css file is then caught by several caching engines and when the Ui is updated, the name of the file for the referring page is updated with it.
d
Sorry @Evil Ware didn't see your message. So why does this one file, and not others, periodically not get found by a bunch of requests? I still don't really get how that relates to disk performance or load.
e
There is a hard-coded max file limit use on the c++ libraries used by Microsoft. What is happening is Your application gets busy, be it bots, bad requests, or just people closing the browser so on and so forth. While the file is open, it eventually will hit a wall, and that wall by default is 512 files or if the jvm was compiled max limit is 8192. The workaround besides recompiling your entire Java stack and then trying to get adobe to recompile their crap to fix your issue is to first set files to open and close as quickly as possible. You can see the max file hard limit set in the Microsoft docs here. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=msvc-170