HomeBlogsmanzalone's blogEnhancing Nagios to monitor Drupal using Drush commands

Enhancing Nagios to monitor Drupal using Drush commands

So, you are thinking about or decided to use Nagios to monitor your server. Nagios also provides a plugin, NRPE, to permit remote monitoring. But how can you use Nagios to monitor your Drupal installation? 

Nagios provides examples for monitoring various system stats. The check_users command is provided by Nagios as a plugin (nagios-plugins-1.4.15 and nrpe-2.12 -- start your search in /usr/local/nagios/etc).  If you are logged into the remote system, you could just execute it from a shell command line:

> /usr/local/nagios/libexec/check_users  -w 5 -c 10

The NRPE plugin provides the command check_nrpe, which permits you to execute commands remotely. It can also be used from the command line on the remote system to facilitate debugging your installation and any new commands you define. The following shows how you would use it to execute the check_users command:

> /usr/local/nagios/libexec/check_nrpe -H localhost -c check_users

While you will find a number of commands listed in the plugins directory, not all are made available to remote systems. They are made available by placing them in the nrpe.cfg file (check in /usr/local/nagios/etc).  The command that provides NRPE access to the check_users command is:

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

Note that the parameters are fixed in this definition.

Now for Drupal. While you can execute the commands provided by Nagios, you may also extend it to provide your own commands. This is where drush once again provides great value to your installation. We need an example. Let's say you want to monitor the number of users using your site.

First we need to define what that means. For this example, we will go directly to the site's database (this example uses mysql) for information. The sessions table will show us the number of users with active sessions. What if you want to know how many authenticated users have been there in the past 15 minutes, or the past day. Within the user table, we have three UNIX (this is important to know) timestamps:

created (date/time the user was created)
login (date/time the user last logged in)
access (date/time the user last did something, even just a page load)

The query to get the number of users who had activity in the past 15 minutes would be:

SELECT COUNT(*) FROM `users` where TIMESTAMPDIFF(MINUTE, FROM_UNIXTIME(`access`), NOW()) < 15

If you wanted for the past 24 hours, you would just substitute 1440 (number of minutes in a day) for the 15.

The step for us is how would you do this from drush? The drush (v3.3) command would be:

drush --root=/full/site/install/path/site.install.dir sql-query "SELECT COUNT(*) FROM {users} where TIMESTAMPDIFF(MINUTE, FROM_UNIXTIME({access}), NOW()) < 15"

The only issue here is that the output contains the COUNT(*) as the column name and that will cause an issue later on. Drush sql-query provides a way to supply extra mysql parameters using the --extra option. The final drush command is:

drush --root=/full/site/install/path/site.install.dir --extra=--skip-column-names sqlq "SELECT COUNT(*) FROM {users} where TIMESTAMPDIFF(MINUTE, FROM_UNIXTIME({access}), NOW()) < 15"

Now you need to add this as an NRPE command.  Go to the nrpe.cmd file and add the following below the sample commands they provided

command[check_drupal_active_users_15]=echo -n Number of users active in past 15 minutes:  ;drush --root=/var/www/html/prod.recruit2care.com --extra=--skip-column-names sqlq "SELECT COUNT(*) FROM {users} where TIMESTAMPDIFF(MINUTE, FROM_UNIXTIME({access}), NOW()) < 15"

The extra shell echo command provides a label to explain the result of the the drush command, with the -n flag stating to not put a newline after the echo'd text. Note that you need to reload the xinetd service (service xinetd reload) for nagios to pick up the new command.

You can test the new command on the remote server by entering the following at a command prompt:

> /usr/local/nagios/libexec/check_nrpe -H localhost -c check_drupal_active_users_15

Now, on the local/monitoring system, you would create a config file that contains a definition for the remote host and services (remote commands to be monitored). The file will resemble:

#### start of file
define host {
        use         generic_host
        host_name   mydomain.com
        alias       mydomain.com
        address     0.0.0.0
        notification_period    24x7
        }

 define service {
        use            generic-service
        host_name        mydomain.com
        service_description    Check Drupal Active Users - 15min
        is_volatile             0
        max_check_attempts      3
        normal_check_interval   5
        retry_check_interval    1
        notification_options    w,u,c,r
        notification_interval   10
        notification_period     24x7
        check_command        check_nrpe!check_drupal_active_users_15
        }
#### end of file

The configuration files should all be included from the nagios configuration file, nagios.cfg. After creating the host and service definitions, you should verify them, then reload the nagios service. The commands would look something like this, depending on your system configuration:

> /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
> service nagios reload

The 'nagios -v' permits you to validate your changes. If it comes up with errors, fix them before trying to reload the service.

Now for some more advanced stuff. Out of the box, the remote side options set it up to not accept parameters on the nape commands. This is done as a security measure. If you configure your NRPE install to use ssl and you restrict access to NRPE to requests from specific IP addresses you can permit NRPE to accept parameters while mitigating risk. Accepting parameters permits you to create custom commands that a little more general. The remote server installation must be built and configured to accept parameters, otherwise you remote requests will fail.

WIth the above example, you can:
- Vary the time period used for looking at the number of users
- Support checking multiple sites on a server without having to define commands specific to each server

Take the above example, you can then generalize it to take 3 parameters:
$ARG1$ - the site directory
$ARG2$ - the time interval
$ARG3$ - the time interval message

The remote server command would be:
command[check_drupal_active_users]=echo -n Number of users active in past $ARG3$:  ;drush --root=/var/www/html/$ARG1$ --extra=--skip-column-names sqlq "SELECT COUNT(*) FROM {users} where TIMESTAMPDIFF(MINUTE, FROM_UNIXTIME({access}), NOW()) < $ARG2$"

The local server's service definition for a 24 hour period would be:
####
define service {
        use            generic-service
        host_name        recruit2care.com
        service_description    Check Drupal Active Users - Past 24 hours
        is_volatile             0
        max_check_attempts      3
        normal_check_interval   5
        retry_check_interval    1
        notification_options    w,u,c,r
        notification_interval   10
        notification_period     24x7
        check_command        check_nrpe!check_drupal_active_users -a site.install.dir 1440 'past 24 hours'
        }
####

The above was just one example, but in it we saw how to define a custom Nagios/NRPE command, use drush as part of that command, and to parameterize a command, if your site security requirements allow for it.


Comments


Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.