Monitoring ColdFusion with Nagios


At work, we run a Nagios server to monitor the health of our web infrastructure. “What’s Nagios?” I hear you ask. Well, nagios is the 800lb gorilla in the world of open source monitoring apps. It has a rich and useful feature set for monitoring the health of pretty much every aspect of your infrastructure from the servers delivering your web apps (and the actual apps running on them) all the way out acroos your network to your gateway routers. And, for an open source project, the documentation is incredibly rich and detailed, covering every aspect of running the software. It pretty much has the ability to monitor everything you run!

I’ve had several emails and no small number of searches on this site looking for information about ColdFusion and Nagios, so I’ve decided to place a short article here on how to monitor the health of your ColdFusion boxes and services using Nagios. Initially, this will only cover installs of CF on Windows servers.

First off, I’m assuming you have a working Nagios installation in place. There are extensive guides on setting up Nagios already out on the web. Read them, get Nagios running and then get familiar with Nagios before trying what’s detailed here.

Install NRPE_NT

First, you’ll need to install the NRPE_NT daemon on each of the Windows servers you have running CF. Follow the instructions within the zip to install. It just works.

Add CF hosts to hosts.cfg

If your CF servers aren’t already among the hosts you’re monitoring, add them to hosts.cfg. A typical host definition looks similar to the one below.

define host {
     use                   generic-host
     host_name             cfserver1
     alias                 main coldfusion server
     address               172.16.3.129
     parents               Internet_Zone
     check_command         check-host-alive
     max_check_attempts    3
     notification_interval 60
     notification_period   24x7
     notification_options  d,u,r
}

Add host groups for your CF servers

Assuming all your CF servers are running the same CF version, you can make a Nagios host group for those servers. This is much less work than adding individual servers to the service check (detailed below). If you have servers running disparate CF versions, set up a host group for each. In our setup, we have a cf5servers group and a cfmxservers group. Add your host group defintion to hostgroups.cfg. A typical host group definition looks similar to the one below.

define hostgroup {
     hostgroup_name cfmxservers
     alias          MCT ColdFusion MX Servers
     contact_groups webdev online
     members        cfserver1, cfserver2, cfserver3
}

Add CF commands to checkcommands.cfg

In a default Nagios installation, checkcommands.cfg contains all the definitions of the commands Nagios uses to inspect services running on the hosts it monitors. The following list of command definitions should cover ColdFusion 5 and CFMX 7 (and will likely also cover CFMX 6, but I don’t remember…). Add the following lines to checkcommands.cfg.

define command {
   command_name    check_coldfusion5
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v SERVICESTATE -l "Cold Fusion Application Server"
}

define command {
   command_name    check_coldfusionjrun
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v SERVICESTATE -l "Macromedia JRun CFusion Server"
}

define command {
   command_name    check_coldfusionmx
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v SERVICESTATE -l "ColdFusion MX Application Server"
}

define command {
   command_name    check_coldfusionmx_process
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v COUNTER -l "\Process(jrun)\Private Bytes","JRun is using %.f bytes" -w 891289600 -c 1073741824
#   comment    Warning 850Mb, Critical 1024Mb (1Gb) YMMV
}

define command {
   command_name    check_coldfusionmx_threads
   command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v COUNTER -l "\Process(jrun)\Thread Count","JRun is using %.f Threads" -w 90 -c 110
#   comment    Warning 90 threads, Critical 110 threads YMMV
}

The last two commands in the above listing check actual CF processes running on your server. Note the comments. The values we use may well be very different in your environment. The easiest way to check is to have a look at what your servers are doing and make an educated guess at the needed levels in your setup. No harm done if you use these values, you just may end up getting copious notifications from Nagios that you don’t need (or conversely, no notifications at all). Test and tweak.

Adding CF-related services to services.cfg

Nagios’ monitoring model is centered on the concept of services, so this step is perhaps one of the most important. You need to add service definitions to services.cfg for each of the command definitions you built earlier. A couple of sample service definitions are shown below.

define service {
    use                   NM-HTTP
    hostgroup_name        cfmxservers
    service_description   Check ColdFusion MX
    contact_groups        online,webdev
    check_period          24x7
    notification_interval 60
    notification_options  w,u,c,r
    notification_period   24x7
    check_command         check_coldfusionmx
    max_check_attempts    3
    normal_check_interval 1
    retry_check_interval  1
#   comment    Check if ColdFusion MX Server is responsive
}

define service {
    use                   NM-HTTP
    hostgroup_name        cfmxservers
    service_description   Check ColdFusion MX Process Threads
    contact_groups        webdev online
    check_period          24x7
    notification_interval 60
    notification_options  w,u,c,r
    notification_period   24x7
    check_command         check_coldfusionmx_threads
    max_check_attempts    3
    normal_check_interval 1
    retry_check_interval  1
}

Restart Nagios

At this point, Nagios should be ready, willing and able to monitor CF on your Windows-based CF servers. Restart Nagios by whichever method you use. This can be done directly on the command line, or using a tool such as NagMin (we use it, it’s excellent and makes the business of configuring Nagios significantly easier).
After restarting, if you log into the Nagios web interface, you should be able to see the CF services you set up being monitored.

I’d be happy to answer questions or elaborate on any of the detail here. Just post a comment or email me offline.

7 Replies to “Monitoring ColdFusion with Nagios”

  1. Thanks for writing this up! I’ve been considering setting up Nagios for awhile. Now that I know I can also monitor my CF boxes – maybe I’ll finally break down and do it!

    Jim

  2. Do you have any experience monitoring CF on Linux with Nagios? How do you read the “\Process(jrun)\Private Bytes” and “\Process(jrun)\Thread Count” for example under RedHat Linux?

  3. Erki, I don’t have a Linux-based CF box running at the moment, so I can’t take a look at how this would be done. Well, not exactly. It should involve setting up the Linux NRPE daemon on the server and determining the processes being run which equate to the Windows processes on a Windows server. Have your NRPE daemon monitor those processes and have your Nagios server contact the remote NRPE daemon for information.

    That’s a really brief summary, and probably lacking in detail, but broadly right. I’d be happy to give you a hand when I get back from the course I’m on. Email me or call me on Skype when I get back (Saturday Australian time).

  4. Awesome! This is great!

    I’ve might have found another way to monitor your ColdFusion apps with nagios. You can use the check_http command, but instead of pointing it to one of your .html pages, point it at one of your .cfm pages.

    Here is what I did:

    • Add the necessary host (described in the howto)
    • Add the necessary host group (described in the howto)
    • Add the necessary service, but send in the .cfm page as a parameter:
      # Service definition
      define service{
              use                             generic-service         ; Name of service template to use
              host_name                       bookexchange.byu.edu
              service_description             COLDFUSION
              is_volatile                     0
              check_period                    24x7
              max_check_attempts              3
              normal_check_interval           5
              retry_check_interval            1
              contact_groups                  win2000-admins
              notification_interval           120
              notification_period             24x7
              notification_options            w,u,c,r
              check_command                   check_http!http://bookexchange.byu.edu/graphs/index.cfm
              }

      So, for my check command, I use:
      check_http!http://bookexchange.byu.edu/graphs/index.cfm
      (I use a .cfm page instead of a .html page)

    • Change the necessary dependencies. Something that I did was I made sure my new coldfusion service in nagios was DEPENDANT on the http service for the same host. This means that if your coldfusion service goes down, nagios will check the http service before sending any notifications:
      define servicedependency{
              host_name                       bookexchange.byu.edu
              service_description             HTTP
              dependent_host_name             bookexchange.byu.edu
              dependent_service_description   COLDFUSION
              execution_failure_criteria      o
              notification_failure_criteria   n
              }

    If Coldfusion isn’t working, then this seems to work. (Note: this is just a work-around that seems to work for me… I am NO expert in Nagios OR coldfusion… I just try to solve problems).

  5. Q: Do you have any experience monitoring CF on Linux with Nagios? ???\Process(jrun)\Thread Count????

    A: We use “pstree” command in bash script:
    /usr/local/bin/coldfusion-threads-snmp.sh

    which prints a CFChildren variable like:

    CFChildren=`/usr/bin/pstree | grep -e cfmx7.*cfmx7.*cfmx | sed -e ‘s/*[cfmx7]$//’ -e ‘s/ *..cfmx7.*.cfmx7…//’`

    and the results of the script can be exported within snmp; an entry in snmpd.conf like:
    exec .1.3.6.1.4.1.2021.500 coldfusion-threads /usr/local/bin/coldfusion-threads-snmp.sh
    will do it. I am confident that you can do it better than me, in a different way.

    Thanks,
    Steve.

  6. Thanks for putting this up. Just a quick ? for you, would it be possible for you to post you template service definition for NM-HTTP?

    I’m curious how you’ve implemented it and where it might differs from the generic one.

Leave a Reply