The mail-database exchanger script periodically updates a few entries in the runtime_info to report about its status. These entries can be checked by an external program; this article demonstrates how.
The relevant keys (runtime_info.rt_key field) are:
mail=> select * from runtime_info where rt_key in ('last_alive', 'last_sent', 'last_import'); rt_key | rt_value -------------+------------ last_alive | 1156772479 last_sent | 1156754648 last_import | 1156771322 (3 rows)The date can be converted to a human-readable form by a one-line script in Perl:
$ perl -e 'print scalar(localtime(1156772479));' Mon Aug 28 15:41:19 2006
The last_alive entry gets updated only if the 'alive_interval' configuration parameter is set in the manitou-mdx config file, which is not the case by default.
In order to check if manitou-mdx's is running, we can create a simple script that connects to the database , read one or several of these entries, and compares them to an expected result. For last_alive, that result is the easier to define. The configuration parameter 'alive_interval' specifies how many seconds there is between two updates of 'last_alive'. If it happens that the difference between the current time and the value of 'last_alive' is significantly higher than 'alive_interval', then it can be assumed that manitou-mdx is no longer running, or something prevents it to update the entry (it could be stuck waiting for a database lock, for example).
In addition, this script can be hosted on a different machine than manitou-mdx and the database, and so will still be able to report if one of those is down.
Below is an example of such a script, in Perl (it assumes the existence of an environment variable named MANITOU_CONNECT_STRING that contains a valid DBI connect string, for example: Dbi:Pg:host=pgserver;dbname=mail;user=manitou )
#!/usr/bin/perl use DBI; use POSIX qw(strftime); # The maximum number of seconds allowed between the # 'last_alive' value of the database and the current time. # When the difference between these two becomes higher # than ALIVE_INTERVAL_MAX, the alert is triggered. my $ALIVE_INTERVAL_MAX=600; # Change these for real addresses my $ALERT_EMAIL="alert\@domain.tld"; my $FROM_EMAIL="alert-sender\@domain.tld"; # A file created when an alert is sent # The alert won't be sent again until this file is removed, either # by us when detecting that the mdx is up again, or # by another program, for instance the mdx start script my $ALERT_LCK="/var/tmp/manitou-alert.lck"; sub alert { my $msg=shift; if (! -f $ALERT_LCK) { # If no lockfile, create one open(F, ">$ALERT_LCK"); print F localtime(time); close(F); # and send the alert alert_mail($msg); } } sub alert_mail { my $msg=shift; open(F, "|/usr/sbin/sendmail -t -f $FROM_EMAIL") or die $!; print F "From: $FROM_EMAIL\n"; print F "To: $ALERT_EMAIL\n"; print F "Subject: alert about manitou-mdx\n"; print F "\n"; # end of header print F "This is an automatically generated alert\n\n"; print F "Error message:\n$msg\n"; close(F); } my $cnx_string=$ENV{'MANITOU_CONNECT_STRING'}; if (!defined($cnx_string)) { die "Missing MANITOU_CONNECT_STRING environment variable"; } my $dbh=DBI->connect($cnx_string); if (!$dbh) { alert("unable to connect to database: $DBI::errstr"); exit 1; } my $sth=$dbh->prepare("SELECT rt_value FROM runtime_info WHERE rt_key='last_alive'"); $sth->execute; my @r=$sth->fetchrow_array; # if there's no entry, we consider there's no error if (@r) { if (time-$r[0] > $ALIVE_INTERVAL_MAX) { my $d=strftime("%d/%m/%Y %H:%M:%S", localtime($r[0])); alert("manitou-mdx appears to be down since $d"); } else { # if the mdx is running and there's an alert lockfile, then remove it # in order not to block further alerts if (-f $ALERT_LCK) { unlink($ALERT_LCK); } } } $sth->finish; $dbh->disconnect;
Similarly, the last _import entry could be used to detect a problem in the mail chain. For example, if a mail system that is generally busy hasn't processed a single incoming message during several hours, that could be considered suspicious enough to trigger an alert.