A few months back, we upgraded all our CFMX 6 servers to CFMX 6.1. Since that time, we’ve been fighting an ongoing battle with our custom CFMX/Spectra OS-based CMS over what we’ve termed "dodgy objects" in our CODB. These dodgy objects cause the CF web servers which deliver some of our sites to be amazingly unstable. Frankly, it’s driving us nucking futs…
We’ve had Robin Hilliard from RocketBoots to assess our issues and work with us on them. We found several issues, which we have progressively addressed, including:
- a buggy Oracle driver in CFMX 6.1
- previously undiscovered bugs in the Spectra code base (not really news)
- several points of unfriendliness between the Spectra OS code and CFMX 6.1
- a few others less interesting
We don’t have the option to stop using Spectra at this point, so we are having to wrestle with all these issues. The upshot of the whole thing is that management and staff confidence in our system is at a very low ebb and it’s very disheartening for me and the good (and very skilled) people who work for me. Frankly, we need a couple of wins.
What we’re seeing more often than not is a combination of increasing thread count and memory usage on the JRun executable on our servers. At some point there’s a sour spot (i.e. the opposite of sweet spot) where memory usage by the JRun executable and the ability of the executable to get more threads to do work reaches a critical point and JRun craps itself.
As far as we can tell, this critical point is approached progressively over time as the site tries to do work with dodgy data that has been entered by one of our (approximately 500) authors – usually by doing some sort of copy-paste from Word. When things such as Verity or our in-memory caching model try to access data with bad content, the memory usage and thread count creep up until the critical point is met and JRun, as stated, goes off into Lalaland.
We’re doing a bunch of things to try to cover our collective backsides until we can get all our potential solutions in, but it’s a progressive and slightly slow process, especially considering our already overworked team.
Here’s what we’re doing:
- writing a Java class to replace all the Spectra db transaction code (cfa_contentobject, cfa_contentobjectget, etc.)
- making sure our authoring system doesn’t allow dodgy content, especially high-ASCII characters, in the content object data/WDDX
- putting some serious logging in our scheduled tasks (indexing, caching) to see where problems might be occurring
- monitoring just about everything going on on our servers in real time so we can see problems occur
- nursing our servers along in the meantime
We really need a break and some fresh ideas. Anything anyone might have, no matter how out there, is welcome.