Management Network Monitoring Lead image: Lead Image © lightwise, 123RF.com
Lead Image © lightwise, 123RF.com
 

Monitor your network infrastructure with SNMP

Clear View

If you don't have the staff to monitor your network in real time, SNMP and a couple of scripts are all it takes to keep track of your device jungle. By Falko Benthin

Routers, switches, servers, printers – data networks contain many complex components, and if you want to manage them with the least possible organizational and financial overhead, all you need is a Raspberry Pi and the Simple Network Management Protocol (SNMP). The latter can be used to query values such as data throughput and CPU load temperature of the device, or even to reconfigure the system.

SNMP is the successor to the Simple Gateway Management Protocol (SGMP). Its specification was approved by the Internet Engineering Task Force (IETF) in 1990, and it has seen several revisions since then. The current version is version 3. In contrast to its predecessors, SNMPv3 supports encrypted communication and secure authentication; however, many devices you can purchase today still only support SNMPv1 or SNMPv2.

Protocol Brief

SNMP uses UDP port 161 by default. Communications rely on agents and managers; the agents run on the individual devices and wait for queries or instructions from the managers. There are also SNMP traps, which cause the device to push a message to a manager when specific events occur. The message typically reaches the manager on port 162.

SNMP reads values from various network components known as managed objects. A managed object can be the status of a network interface, CPU, or device memory. To establish a standard here, the SNMP Management Information Base (MIB) was developed. The properties of many managed objects are described in the MIB tree structure. The descriptions contain the name, or OID (object identifier), and the permissible data types for an object. The OID can be numeric or humanly readable; for example, iso.org.dod.internet.mgmt and 1.3.6.1.2 refer to the same object, and they can be used as equivalents in queries.

SNMPv1 and SNMPv2 use what are known as "communities" to establish connections that are trusted by managers and agents. To allow this to happen, community names exist for Read-only, Read-write, and Trap. The community names replace passwords; however, because they are transferred in the clear, an attacker can quite easily sniff them.

Various activities can be handled using community names. With the Read-only community string, which most manufacturers default to public, you can only query data from a device. The Read-write community string, which is typically set to private for most devices in the as-delivered state, also lets you make changes to the device. For example, you can set counters or change the router configuration. Finally, the Trap community string is only needed so that managers can accept trap messages from the agents.

The two latter community strings are not often seen on devices for end users. However, SNMPv3, which encrypts the entire communication and forces participants to authenticate at regular intervals, is even less common.

The Scenario

I will be referring in this article to the example of a complex of buildings whose users sporadically complain about slow Internet connections. It should be possible to query some device values with the Net-SNMP tools [1] and log anything suspicious. On this basis, I can collect information about what is happening on the network and make decisions for improving the infrastructure.

Hochlland [2] is an educational institution in Potsdam, Germany, comprising three multistory buildings regularly visited by school classes and education groups. The three buildings share an Internet connection and are connected by wireless and cable links of various quality. WLAN service for guests is currently provided by eight access points (APs), although there are plans to increase this number to 14 when the network is next expanded.

The educational institute pursues a policy of self-organization, which means that the groups use the buildings totally autonomously. Staff is not always on site, so problems with what is typically a stable Internet connection are difficult to understand in retrospect. Many groups arrive and leave on their own – and it can occasionally happen that an access point leaves with them. Additionally, some neighbors have discovered the access credentials of the semi-public house and like to make extensive use of the network. The idea is also to make this kind of access more difficult.

Preparations

To identify problems that occur in good time, I installed a Raspberry Pi as a monitoring system in the building. Its job is to monitor the devices, query additional values in case of selected events, and, if needed, notify staff. The Rasp Pi runs the Darkbasic Raspbian minimal image [3]; I added the Raspberry Pi package sources (Listing 1, lines 1 through 3); updated (lines 4 and 5); installed the required applications (line 6), in particular from the packages snmp and snmp-mibs-downloader; and performed a firmware update (lines 7-9).

Listing 1: Installing and Updating Packages

01 $ sudo echo "deb http://archive.raspberrypi.org/debian wheezy main" >> \
   /etc/apt/sources.list
02 $ sudo wget http://archive.raspberrypi.org/debian/raspberrypi.gpg.key \
   -O raspberrypi.gpg.key
03 $ sudo apt-key add raspberrypi.gpg.key
04 $ sudo apt-get update
05 $ sudo apt-get upgrade
06 $ sudo apt-get install vim vim-runtime aria2 ntpdate anacron msmtp-mta \
   bsd-mailx raspi-config less screen snmp snmp-mibs-downloader
07 $ sudo curl -L --output /usr/bin/rpi-update \
   https://raw.githubusercontent.com/Hexxeh/rpi-update/master/rpi-update && \
   chmod +x /usr/bin/rpi-update
08 $ sudo rpi-update
09 $ sudo reboot

I enabled SNMP agents on all the access points used here with DD-WRT [4] (Figure 1) or Ubiquiti airOS [5] (Figure 2). The existing WLAN zoo does not lend itself to a standardized solution, and I needed to set static routes on some routers and the Rasp Pi to allow the nanocomputer to reach all the devices. Listing 2 shows how to set routes statically with route. Once everything is working, you can add the corresponding entries below the matching network interface in the /etc/network/interfaces configuration file to avoid losing them when you reboot (Listing 3).

Listing 3: Additions to /etc/network/interfaces

# /etc/network/interfaces
up route add -net 192.168.100.0/24 gw 192.168.2.2 dev eth0
up route add -net 192.168.13.0/24 gw 192.168.2.2 dev eth0

Listing 2: Set Routes

$ sudo route add -net 192.168.100.0/24 gw 192.168.2.2
$ sudo route add -net 192.168.13.0/24 gw 192.168.2.2
The SNMP agent in the free router firmware DD-WRT is quickly enabled and accepts read and write access.
Figure 1: The SNMP agent in the free router firmware DD-WRT is quickly enabled and accepts read and write access.
Many commercially available network devices come with an SNMP agent.
Figure 2: Many commercially available network devices come with an SNMP agent.

SNMP in Action

After completing the preparations, you can query all the SNMP information available for a device using the command:

$ sudo snmpwalk -v1 -c <RO-Community-String> <Host> <OID>

The -v1 parameter enforces the use of SNMPv1, and you can set the community string with -c. On top of this, you need the hostname or the IP address of the computer to query and its OID. If you simply enter a dot for the latter, snmpwalk queries all available OIDs (Figure 3). The more precise the OID, the less you are flooded with information.

Taking an snmpwalk from the route of the MIB tree (OID: .) returns many results.
Figure 3: Taking an snmpwalk from the route of the MIB tree (OID: .) returns many results.

Information about the meaning of individual OIDs is often found on the device manufacturer's website or on relevant Internet forums. Additionally, there are many standardized OIDs (e.g., for names and uptimes of devices and for the number of packets sent and received). Depending on the product, you can also query information about the number of clients connected with the access point or the number of available and assigned DHCP leases.

Other SNMP commands are available in addition to snmpwalk. Whereas snmpwalk returns the complete OID branch as per the request, snmpget restricts itself to the specified OID. By default, the commands return OIDs on Debian/Raspbian in numeric format. If you prefer to see the intuitive name instead, to be able to assess the significance of the entry more easily, you can uncomment the mibs : line in the /etc/snmp/snmp.conf file.

At Hochlland, I was mainly interested in a couple of things: Can all the devices still be reached? How many WLAN clients are connected to the individual APs? What does the CPU and memory usage look like, and how many packets are passing through each device? You could add any number of points that you are interested in here.

To test the availability of the devices, it makes more sense to use ping than to use SNMP. If you can't ping the system, SNMP will also complain about a timeout, and you would be able to use this for further evaluations. Sending many SNMP queries takes far longer than just offloading a couple of pings, however.

Keep It Short …

A MIB tree typically contains far more information than you actually need for your evaluation. You can do yourself a favor by restricting the output to less information.

The devices connected to individual routers or access points are provided by the ipNetToMediaEntry (OID 1.3.6.1.2.1.4.22.1) branch, for example. You can query available and used memory using OIDs 1.3.6.1.2.1.25.2.3.1.5.101 and 1.3.6.1.2.1.25.2.3.1.6.101; you can see the average CPU load of the last 15 minutes with the OID 1.3.6.1.4.1.2021.10.1.5.3; and OID 1.3.6.1.2.1.2.2.1 collects all the available information for the network interfaces. Again, it might be worthwhile to reduce the volume of data here: You will typically not want to log MTUs, interface designations, and so on every time.

Once you have established that snmpwalk or snmpget returns the results you need, you can bundle the commands into the script that is run later on as a cronjob. Snmpwalk offers another couple of options for truncating the output. In the sample script monitor.sh (Listing 4), the query uses -Oqs; that is, only the last element in the OID and the matching value are output.

Listing 4: monitor.sh

#! /bin/bash
#: Title: monitor.sh
#: Date: 28.01.2015
#: Author: Falko Benthin
#: Version: 1.0
#: Desciption: Sends SNMP requests to individual APs/routers and logs \
   the output with timestamps for evaluation later
#: Options: none
# sends snmp requests to individual hosts
function checkMachines() {
  # ipNetToMediaPhysAddress
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.4.22.1.3
  # memory_used
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.25.2.3.1.6.101
  # CPU-load-1 snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST 1.3.6.1.4.1.2021.10.1.5.1
  # CPU-load-5 snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST 1.3.6.1.4.1.2021.10.1.5.2
  # CPU-load-15
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.4.1.2021.10.1.5.3
  # wlan_clients
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.4.1.2021.255.3.54.1.3.32.1.4
  # ifInOctets
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.2.2.1.10
  # ifInUcastPkts
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.2.2.1.11
  # ifInDiscards
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.2.2.1.13
  # ifInErrors
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.2.2.1.14
  # ifOutOctets
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.2.2.1.16
  # ifOutUcastPkts
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.2.2.1.17
  # ifOutDiscards
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.2.2.1.19
  # ifOutErrors
  snmpwalk -v1 -Oqs -c $ROCOMMUNITY $HOST .1.3.6.1.2.1.2.2.1.20
  }
# Directory for logfiles
LOGDIR="/home/falko/monitorlog"
# community string
ROCOMMUNITY="community"
# date
YEAR=$( date +%Y )
MONTH=$( date +%m )
DAY=$( date +%d )
while read HOST DESC
do
  DATEDIR=$LOGDIR/$YEAR/$MONTH/$DAY
  # Directory for date
  if [ ! -d $DATEDIR ]; then
    mkdir -p $DATEDIR
  fi
  # check if host is reachable
  if ! ping -c3 $HOST > /dev/null; then
    if [ ! -e $LOGDIR/$HOST.lastmail.log ] || [ ! $( date -d @$( cat \
       $LOGDIR/$HOST.lastmail.log ) +%d ) = $DAY ]
    then
      printf "The AP/Router %s, %s is not reachable. Please check." \
             $HOST "$DESC" | mail -s "Check AP/Router" recp1@samplemail.org \
             recp2@samplemail.org recp3@samplemail.org
      echo $( date +%s ) > $LOGDIR/$HOST.lastmail.log
    fi
  else
    # SMTP-Checks and Logging
    checkMachines | \
    while read OUTPUT
    do
      printf "%s %s\n" $( date +%T ) "$OUTPUT" >> $DATEDIR/$HOST.log
    done
  fi
done < machines.txt
exit 0

The script bundles the individual queries into a function so that you only need to modify one part if the requirements change. You can save the hosts you want to monitor with their IP addresses and a description in a text file (Listing 5). The description contains the device type and location so that a member of staff who is not familiar with the setup still knows where to look. Finally, monitor.sh is called regularly as a cronjob.

Listing 5: IP Addresses of Hosts

192.168.2.1   Modem in the office
192.168.2.2   Picostation roof
192.168.10.1  AP seminar building
192.168.10.4  AP Hochlland canteen
192.168.13.1  Picostation new building
192.168.13.2  AP new building first floor
192.168.13.3  AP new building second floor
192.168.13.4  AP new building top

Logfiles

If guests complained that the Internet was slow, a quick check of the logfiles for the day in question helps to identify potential issues or discover whether you need additional information.

Moreover, you can check whether there are WLAN clients that connect suspiciously frequently to your APs and potentially need special treatment. You might want to query more information from the WLAN AP in question and define a firewall rule for the client on that basis.

In the course of time, you will collect a large volume of log data. A script that runs once a day, compress_and_delete.sh (Listing 6), helps save storage by compressing the previous day's logs with gzip and deleting the logs after 30 days.

Listing 6: compress_and_delete.sh

#! /bin/bash
#: Title: compress_and_delete.sh
#: Date: 28.01.2015
#: Author: Falko Benthin
#: Version: 1.0
#: Description: Compresses old logs and deletes very old logs
#: Options: none
# Directory for logfiles
LOGDIR="/home/falko/monitorlog"
# gestern
YEAR=$( date -d "yesterday" +%Y )
MONTH=$( date -d "yesterday" +%m )
DAY=$( date -d "yesterday" +%d )
# Compress yesterday's logs
gzip $LOGDIR/$YEAR/$MONTH/$DAY/*log
# Delete old logs
find $LOGDIR -mtime +30 | xargs rm
exit 0

Conclusions

SNMP lets you collect a large volume of data that can help you with your evaluation and thus support troubleshooting. The protocol and the queries are easy on resources, so you could assign other tasks to your Raspberry Pi, if necessary.

That said, plain vanilla SNMP queries are not to the liking of administrators who prefer monitoring in real time with visualization. If this sounds like you, you might prefer to go for MRTG, Cacti, or legacy network monitoring tools such as Nagios, Icinga, Zabbix, or Munin.