We seem to have resolved the majority of our issues with CFMX server stability. A combination of settings and deployments was the answer.
Here’s what we did:
- update our Oracle datasources to use a new Oracle driver jar file,
- modify our DSN settings for those datasources to maintain connections. We had previously set this to off on some well-informed advice.
Prior to making these changes, we were seeing upwards of 80 database connection issues per minute in the
Making these two little changes seems to have brought about a quantum change in stability. The paired servers we have delivering out sites which were previously ultra-flakey are now pretty much rock solid. The whole thread count issue we were seeing has pretty much gone away (see below for a little more discussion) with thread count still increasing under load, but then gracefully returning to normal.
Now, the thread count thing is still an issue. Our Nagios-based monitoring box saw the jrun.exe do this on one of our servers last night:
Time 0132 0234 0336 0438 0540 0642 0709 0718 0820 0837 0846 Threads 177 293 425 547 669 683 86 134 252 88 68 Restart CFMX
This seems to have been caused by some database connectivity issues at our hosting provider. When we did the restart at 8:37AM, everything came back to normal.
A normal thread count for CFMX appears to be somewhere in the 45-65 thread range, dependent upon load. Anything over 70 seems to be cause for concern, and over 80 means we get JRun errors off our servers and the service needs restarting. Does anyone out there know what "normal" for CFMX/jrun.exe thread count is?
We’re making some more changes which we hope will see more improvements, but at the very least we appear to have bought ourselves some breathing space…