Sigh. I've talked about this before, with no resol...
# adobe
d
Sigh. I've talked about this before, with no resolution. We're running a fairly large 24x7 app on CF2018. Every so often, seemingly nonsense crashes start happening on a page that hasn't changed in a while, errors like
Element EXECUTIONMODE is undefined in THISTAG
and
Could not find the included template
on pages that have run many times for weeks without problems and haven't been modified since. Clearing the template cache from CF Admin fixes it, instantly. Happened again this evening, same fix. Excuse my frustration, but WTF?!? This 100% does not help CF's reputation, Adobe's, mine, or our team's, not to mention that it's a service app, and it's failing our users. One poor guy banged on the same page 31 times tonight before giving up; I saw the errors much later. Trusted cache and Save class files are both unchecked, and have been for a long time, and I cleared out the cache directory manually a while ago too. We're not dynamically writing pages to disk, or using the in-memory file system. It's mostly old- or semi-old-school code, cfms that call CFCs for data, business logic, and sometimes rendering, some common includes, the usual, straight-up ninja-free stuff. Clearing the template cache doesn't change the code but fixes the problem (until next time), which on the face of it seems to say he problem isn't our code, but some kind of foobar in the CFML engine itself. What now? @Mark Takata (Adobe) @saghosh, can we get some help on this from Adobe?
c
Shit in the dark - anti virus on your server interfering? We recently switched software and had some completely different issues, but can generally be a thing.
s
Hi Dave, could you please raise a bug and attach the log file? Thanks once again.
d
Thanks for chiming in @saghosh. There's already a bug filed for this, from my personal account, with at least one other user reporting the same thing. Which log files do you want? If I upload them to the ticket, am I right that they'll only be visible to to Adobe, not other site visitors?
s
Others can also access the same. Tell you what, send me the log file to saghosh@adobe.com
I'll ensure full privacy.
d
Thank you, emailed you.
Let me know if you need other log files.
e
The easiest solution is to setup a task to restart the ColdFusion and IIS services every day. This is a bandaid for something deeper such as misconfigured application pool and settings not being mirrored in the ColdFusion config. IIS has "sessinon timeout" which needs to be in alignment with your Cfusion config. During install of the IIS connector, you are asked for Reuse Connections ,Connection pool size, connection pool timeout. Connection pool timeout needs to match Session timeout, annd the defualt number of connections, what ever you have set, or have left it default should be either DOUBLE (if you set something) or 125 * GB Ram for a safe rule of thumb. So if your server has 128GB of ram, you would set the value to 16K.
👍 1
d
Thanks for chiming in, but how do those settings, misconfigured or not, end up corrupting the template cache? (We're 24x7, daily app restarts would be most unwelcome.)
(Just to say it, if there was an API for clearing the template cache procedurally we could try that, though it shouldn't be necessary.)
e
IIS is application 1, and Coldfusion engine is application 2, if they are not in alignment then Application 1 ends the session while application 2 is still waiting for the session to end. Application 2 eventually runs out of resources as Application 1 has used up all available resources without notifying Application 2 that it can now reuse those resources for something else. As for how, usually broken upgrades, patches, and permissions settings, blindly and frantically re-installing Coldfusion during a crash; are all just some of the reasons. Even if your application pool is recycling its memory, when is it doing it, and is it dropping the session or leaving the session hanging in Coldfusion? If you have a small or default pool and it's getting slammed, you could have all the resources to run ColdFusion but if your IIS pool is too small, even though you have all the resources needed to handle the load, the artificial throttling of your application sessions will cause problems. It could be your code, as application1 calls another session in a related application2, or just asks to switch a session, which has to have a free session available to move the current session too. You say its a large site, are you running out of free ports, cpu or iis application throttling is more likely the case. It further gets more complicated if you are using any additional language adaptors or modules in IIS. The whys vary, but if its that big of a site, and restarting the application stack isn't an option, you may need to hire a consultant well-versed in the enterprise scope of handling large traffic ColdFusion applications.
You can use CFACHE as a scheduled task to clear the cache.
d
1. Is all of this still in play when Trusted cache and Save class files are both UNchecked and have been for some time? 2. What attributes would you suggest passing to cfcache to do what the Clear Template Cache Now button in cfadmin does, given that Trusted cache and Save class files are both unchecked?
3. Are there any metrics I can see that might shed some light on this?
e
Fusion Reactor can tell you what is wrong with your application or at least what your ColdFusion code is trying to do and how hard it's doing it. IIS has logs, Windows OS has logs, and any kind of managed switch, firewall, or load balancer will have logs. You need to know how much traffic your application is receiving and how many resources it is consuming. As for cfcache, Cfcache "flush" , you can just write a simple page or cfc that when invoked just clears everything and run it at least once a day. The metrics engines are up to you to choose, configure and use. There are multiple open-source products as well as commercial offerings.
For windows its called performance counters (IIS), logman, eventviewer, powershell, to name a few for windows. AFC has the CF monitoring toolset. At the commandline you can do a simple netstat -an to show everything open, or pipe it to a file and grab the number of lines.
here is the GUI "freeware" that is used on a few older servers we have, https://www.nirsoft.net/utils/cports.html
m
Might be worth bringing@davidtat into the mix
d
1. Are you saying that template caching is still active even when Trusted cache and Save class files are shut off in cf admin? 2. Am I right that
<cfcache action="flush">
is the equivalent of Clear Template Cache button in cf admin? 3. Of course there are many logging tools available for Windows, the question is, are there any that show anything revealing about CF's template caching? a. FusionReactor is installed and running on that server, but I haven't found any spikes or other notable events around the time things went south. b. The crashes I know about were caught by our error handling, so they don't show up in FR. c. As I've said, the crashes are seemingly random, happening on frequently used pages that haven't changed in a long time. There's nothing in the crash dumps that looks like anything other than the actual crashes that happened.
@Evil Ware why do you think network connections are relevant here?
e
@Dave Merrill its intermittent, and it follows the previous issues of busy single-source ColdFusion instance. The windows OS stack only has 65K available ports, so even if you have a really powerful server, nothing will keep it from exhausting all available ports. I have found this to be true with a few of the sites I have had to upgrade, which are all real-time enterprise applications. If it was a pure code issue, then it should always be reproducible, if it's a pure hardware issue, then it would be far more random than one file. However, if it's a port exhaustion issue, it could be your code and a combination of a lack of available ports.
The "other pages are ok" caveat is ultra helpful. This goes back a decade but, disable 8dot3 name creation on both the drive that hosts your Java, windows OS, and Coldfusion, as well as "F". Then go download and run handles, which you can read about and find the download link here https://learn.microsoft.com/en-us/sysinternals/downloads/handle Max file stream on windows is hardcoded to 512.
d
Hmmm, interesting thought. Not familiar with Handle, but I've used ProcessExplorer a lot, didn't think to try it here. Thing is though, disk caching is disabled in cf admin to the best of my understanding, so both template caching and its corruption must be happening in memory, which is why I'm looking for tools to peek at it.
e
Java on Windows is not exactly the same as Java on UNIX / LINUX devices. Java in-memory objects are completely handled differently when it comes to how they still, even when you ask "stay in memory" can actually create small temp files as well as write entries into the swap, or PFS table. In addition, your raid configuration could be your bottleneck as your "in memory" item still writes crap to the PFS, which if its IOPs write starved, could also be a cause of your app suddenly not finding "this" as the write to virtual memory hasn't happened while the execution has.
d
@saghosh @Mark Takata (Adobe) I provided the log files requested on 1/10/23, and haven't heard anything. The intermittent errors continue to happen. What's happening? How can I help, short of steps to reproduce, which I don't have?
m
@priyank_adobe @sandip_halder do we have any update on this ticket?
d
For reference, the existing ticket is CF-4214784.
s
let me check