If you have any responsibility for managing servers for your organisation, you should be using Nagios.
Nagi-what, I hear you ask. Nagios, I say. Nagios is a very rich, open source network and server monitoring utility which runs on Linux and is saving our arse big-time.
Several months ago, we shifted all our servers to a new host. At the time, we were assured they were equipped to monitor everything we wanted. Turns out, they can’t.
So, I’ve grabbed an old box (Celeron 700, 128Mb RAM, 20Gb HDD) here at work, installed Fedora Core 2, Nagios, WebMIN and NagMIN, slotted it into the rack on our floor and now I’m within a couple of days of having full, 24×7 monitoring of our development and production environments.
We’re monitoring pretty much everything you’d care to name in our development environment – CPU, disk space, processes, memory use, HTTP, ColdFusion, blahblahblahblahblah – we intend to do the same in production.
Configuring Nagios is something of a pain, even with its excellent doco, unless you’re really happy using vi (which I loathe). Thus the install of NagMIN. It’s a WebMIN module which allows you to store your config in a MySQL db and update on the fly. Very sweet (despite NagMIN’s ordinary doco).
I’m a little way off on the production machines – they’re physically in Sydney (I’m in Canberra), and through several firewalls which block everything. I think I’m going to have to install NRPE-NT to get more meaningful data from the remote, production boxen.
We have email and SMS notifications being generated off alerts, so we can be informed of problems 24×7 – and then escalate those issues to our Service Desk and management at appropriate times. Very sweet.
I’d appreciate any feedback from other Nagios users on way’s they’re using their installs, especially as far as things like host and service escalations are concerned.