Features Remote Sys Admin Lead image: Laurent Davoust, 123RF.com
Laurent Davoust, 123RF.com
 

Sys Admin On the Go

Remote Control

Keep track of your job when you're not in the office. By Juliet Kemp

As a sysadmin, you don't want to be chained to your desk – it's great to be able to work from home once in a while, or to be able to head off to conferences. But neither do you want the machines you're responsible for to be left unsupervised. Even if you don't mind always being physically in your office during the daytime, there's always the possibility of out-of-hours issues – or what if you're fixing something else at the other end of the building when one of your main servers starts having issues?

In this article, I look at ways to free you from your office by setting up mobile alerts to let you know when something goes wrong, the software you can use with a smartphone to fix it, and other options you have to control your machines from a distance.

Monitoring and Alerts

Several software options will let you keep an eye on your servers and send alerts when something goes wrong. If you set up your monitoring software to send email and you have a smartphone or a similar email-equipped device, you can be warned about problems even when you're not in the office.

Here, I'll do a quick rundown on how to set up email alerts on two of the most popular open source Linux system monitors: Nagios and Hobbit. (I'll assume you already have a working monitoring setup, in that covering the basics of either Nagios or Hobbit is outside the scope of this article.)

Nagios

First, set up an admins contact group in the contacts_nagios2.cfg file. Defining your admin contacts here means that you can simply refer to the admin group elsewhere in your config files. Then, if someone leaves the group, you only have to change the information in a single place. Make sure that a user with your email address is included in this group.

Second, decide which services you want to receive email from. The simplest option is to receive email from all services, by editing the generic service definition (which is used as the basis for all specific services) at conf.d/generic-service_nagios2.cfg. Add the code in Listing 1 to the service definition.

Listing 1: Notification Setup

01 notification_interval        1440
02 is_volatile                  0
03 check_period                 24x7
04 normal_check_interval        5
05 retry_check_interval         1
06 max_check_attempts           10
07 notification_period          24x7
08 notification_options         c,r
09 contact_groups               admins

The notification interval (in minutes) defines how often you are reminded (here, it's every 24 hours). The check_period defines when the service is expected to run (all the time). Time periods such as 24x7 are defined in conf.d/timeperiods_nagios2.cfg. The normal_check_interval and retry_check_interval are in minutes: The service here is set to be checked every five minutes, but if the check is unsuccessful and a retry is made, the retry should happen every minute. Ten retry attempts will be made before something is considered wrong with the service. Although you can reduce this number, if you reduce it too far, you might start getting false positives.

The notification_period sets when alerts should be sent – this setup provides for 24/7 notification – and notification_options sets when you should receive an alert. For hosts, use d to notify on DOWN states, u to notify on UNREACHABLE states, r to notify on host recoveries, and f to notify when the host starts and stops flapping. For services, use w to notify on WARNING states, u for UNKNOWN states, c for CRITICAL states, r for RECOVERY, and f for start/stop of flapping (when a service or host keeps rapidly changing state, perhaps indicating a problem). Finally, contact_groups defines who to contact when a notification is required (the admins group here).

Once you have all that in place, reload Nagios and try turning off SSH on one of the machines being monitored: You should receive a message to the address you set in the contacts file, telling you that the client is not SSH accessible. When you turn it back on, you'll get another alert telling you it's okay again.

One problem you might run into that will prevent an email being sent is a default From: line in the email alerts of nagios. If your mail server requires a registered address before it will send, mail from this user will be rejected (unless you have a nagios@mydomain.com email address set up). If you're using exim4, you need to set the untrusted user option and then add the line

-- -f arealaddress@mydomain.com

to the end of the host-notify-by-email and notify-by-email commands in commands.cfg.

If you want different people to be notified at different times (perhaps one person handles evenings and another handles weekends), the best way is to set it up in their contact definition in contacts_nagios2.cfg. Listing 2 shows a definition that uses my personal email for weekend emergencies when I don't check my work email.

Listing 2: Definition for Weekend Notification

01 define contact {
02    contact_name   juliet-personal
03    .....
04    host_notification_period   weekend
05    service_notification_period   weekend
06 }

Because this references the time period weekend, I also need to set that up in timeperiods_nagios2.cfg (Listing 3). The timeperiod definition can also handle references to other defined time periods and can exclude times, as well as include times. With this setup, I can now add juliet_personal to the admin group. However, that contact will only be used during the defined time periods, so I won't get mail to my personal account on weekdays.

Listing 3: Creating a "weekend" Time Period

01 define timeperiod {
02    name   weekend
03    timeperiod_name   weekend
04    friday         18:00-24:00
05    saturday      00:00-24:00
06    sunday         00:00-24:00
07    monday         00:00-09:00
08 }

Hobbit

To set up email alerts with Hobbit, edit /etc/hobbit/hobbit-alerts.cfg. Alerts are highly configurable and are set up to define, first, the condition in which an alert should be sent and, second, the action to take for that alert. The code in Listing 4 emails you if any machine (note that the regexp must start with %) is unavailable for 30 minutes (DURATION), repeats every 24 hours (1400 minutes) thereafter, and emails you on recovery (RECOVERED). Note that most of this is set up in the $MAILADMIN variable, which is used for settings that you're unlikely to change, making for less typing. The settings can be overridden in the line where the variable is used.

Listing 4: "Host Unavailable" Alert with Hobbit

01 $MAILADMIN=MAIL admin@example.com REPEAT=1440 RECOVERED
02
03 HOST=%.* SERVICE=conn
04         $MAILADMIN DURATION>30

As with Nagios, you can set more than one alert for slightly separate conditions. For example, you might want to email the on-call address, as in Listing 5, rather than your daytime work address if the service goes down out of hours.

Listing 5: Setting Multiple Alerts for a Single Service

01 $MAILADMIN=MAIL admin@example.com REPEAT=1440 RECOVERED
02 $ONCALLADMIN=MAIL on-call@example.com REPEAT=1440 RECOVERED TIME=*:1800:0800
03
04 HOST=%.* SERVICE=ssh
05         $MAILADMIN DURATION>30
06         $ONCALLADMIN DURATION>30

The important part there is the TIME setting: If this is unspecified, the default is "at any time." Here, the second alert is set to be sent only between 6pm (1800) and 8am (0800) on any day (the first *; use W for weekdays, or specify particular days with 0, Sunday, through 6, Saturday).

Instead, different people might need to be notified in different situations, in which case SERVICE can be specified in the MAIL line rather than in the first condition line, as in Listing 6.

Listing 6: Alerts for Multiple Services

01 $SSHADMIN=MAIL admin@example.com SERVICE=disk,ssh RECOVERED
02 $WEBADMIN=MAIL webmaster@example.com SERVICE=http RECOVERED
03
04 HOST=webserver.example.com
05         $SSHADMIN DURATION>30 REPEAT=1h
06         $WEBADMIN DURATION>30 REPEAT=1h

Here, the repeat would be every hour, and the admin user will be emailed for problems with the disk or SSH services, with the webmaster being emailed for HTTP problems.

Hobbit also supports the use of the SCRIPT keyword rather than MAIL, which will run any script you care to plug into it if straightforward email doesn't suit your purposes.

Many other pieces of monitoring software are available, and most, if not all of them, should have similar capabilities for sending out alerts (see the "Sending SMS Messages" box). Nagios and Hobbit are particularly good for larger networks, though.

Software for Mobile Devices

The next step is not just knowing what's going on, but being able to fix it even if you're not there at the right moment. Happily, these days, smartphones are increasingly full of features, and you can do a lot anywhere you have a data connection.

One of the most useful pieces of software the roving sys admin can have is SSH. A variety of SSH is available for most smartphones, including the following:

Although I have personally used pssh, ConnectBot, and iSSH on the iPhone, I have not used the others because I don't have access to the appropriate devices.

Additionally, you can use your smartphone to connect to a server via VNC, if your server supports that. In most cases, the SSH command line will be more responsive than a VNC connection, and for most server problems, that should be all you need.

However, for a small class of problems (and especially if you run any, usually Java-based, services that have a graphical admin interface), the VNC option can be useful. iSSH provides a pretty good VNC client, or for Android, you can use android-vnc-viewer [6], although this is reasonably new software.

Emailing Your Server

You can also set up your server to respond to specific email. Once, I had a couple of machines that would regularly (because of a specific, and sadly non-resolvable, closed source software problem) lock up their screens.

The solution was to restart GDM. The following line shows an email address, set up in /etc/aliases, that does this for a particular machine:

gdm-restart: "|/usr/sbin/restart-gdm"

This pipes the incoming email into the named script, which can do a quick-and-dirty security check for the sender of the email, as shown in Listing 7.

Listing 7: /usr/sbin/restart-gdm Script

01 #!/usr/bin/perl -w
02
03 use strict;
04
05 my $legit_sender = "juliet\@mydomain.com";
06
07 while(<>) {
08    if ( /^From:/ ) {
09       `sudo /etc/init.d/gdm restart` if ( /$legit_sender/ );
10        exit 0;
11    }
12 }

Two final authorization changes are needed to make this work. First, if you're running exim4, you need to authorize that scripts are run from email aliases by editing /etc/exim4/exim4.conf.template to include this line (this should be outside the SYSTEM_ALIASES_PIPE_TRANSPORT ifdef block),

pipe_transport = address_pipe

then reload it (/etc/init.d/exim4 reload) for the change to take effect. Other mailers might have to do something similar.

Second, you'll need to give the exim user (Debian_exim in Debian Lenny) sudo rights to run this particular command without a password. Run visudo and add this line at the bottom:

Debian-exim machinename = NOPASSWD: /etc/init.d/gdm

Now send mail (content doesn't matter here) to gdm-restart@mymachine.mydomain.com, and GDM will be restarted!

IMPORTANT NOTE! This script is NOT particularly secure! A straightforward spoof of the From line in an email is possible. Reloading GDM isn't, as a rule, a particularly dangerous thing to do (although it could seriously irritate a user who was in the middle of something), but it's important to be aware that restarting software does carry a risk of an attacker being able to run a replacement script or application if they've managed to insert one into your system. Be very careful what you allow this sort of script to do, and use this kind of automation with caution.

However, used carefully this technique can be really useful. Another possibility, for example, is to write a script that acknowledges a Nagios alert so that it doesn't keep bugging you if you've already decided that it's not urgent.

Conclusion

The scope for remote manipulation of your servers is extensive – as is their ability to manipulate you remotely by sending relevant information to you wherever you are! By trying out some of these techniques, you could free up your working life a little.

Just remember to mention it in your performance review if, as I did once, you end up fixing a web server from a tent in the middle of a field somewhere!