Nuts & Bolts OpenNMS Lead image: © Dmitry Rukhlenko, 123RF.com
© Dmitry Rukhlenko, 123RF.com
 

The Swiss Army knife of fault management

Flexible Monitoring

The OpenNMS monitoring tool is a kind of all-purpose weapon that can provide administrators with a versatile range of information. By Jens Michelsons

OpenNMS is known for its excellent scalability, but it is also characterized by its open interfaces and a variety of standard tools that can be used to monitor the status of services like HTTP, SMTP, POP3, or DNS immediately after installation. The system is extensible thanks to configurable monitors, so administrators can quickly cook up a large collection of monitoring options that don't require any programming skills from their users.

Monitors for SNMP and Windows services and processes that also use SNMP to retrieve information can be very useful. In this article, I'll explain how to use these monitors, taking into account that many administrators are unaware of the scope of options available.

SNMP Overview

The Simple Network Management Protocol (SNMP) was developed and standardized for monitoring and controlling network components. A Management Information Base (MIB) reveals the details of information that can be queried. This tree-like structure (Figure 1) stores all of the retrievable properties, and each property is accessible via an object identifier (OID).

Object identifiers guide you through a tree-like structure of properties, some parts of which are standardized and some of which are defined by the device manufacturer.
Figure 1: Object identifiers guide you through a tree-like structure of properties, some parts of which are standardized and some of which are defined by the device manufacturer.

In production, the Standard MIB II ranges (1.3.6.1.2.1.*) and the vendor-specific area below the enterprises branch (1.3.6.1.4.1.*) are relevant. The Standard MIB II is specified by IETF, and all vendors are supposed to implement it in a standardized manner. MIB II contains system information such as the SNMP device name, network interface parameters, and routing information. The vendor specific part is where you will find data such as hardware information, performance counters, or status information. Because vendor-independent standards exist for querying and setting a variety of values, SNMP has become a powerful network management protocol.

What SNMP Can Give You

In many cases, administrators want to monitor the state of hardware components such as hard disks, processors, memory chips, fans or temperature conditions. Often, administrators mistakenly think that only the vendor is capable of supplying this kind of hardware monitoring. But much hardware information can also be retrieved using SNMP. An SNMP monitor called OpenNMS relies on users to provide vendor-independent hardware monitoring. Popular management agents include:

Administrators often need to monitor Windows services and processes and, again, might not realize how useful SNMP can be in this context. The options really seem to be unlimited – from querying the status of the UPS, the air conditioning system, and sensor values, to monitoring and controlling complete systems.

Identifying Services

OpenNMS can automatically identify the services you need to monitor. In versions up to 1.8.x, this task is handled by the capabilities daemon (capsd) assisted by plugins. Version 1.8 introduces a new mechanism, which is integrated into the provisioning daemon (provisiond) as a foreign source. OpenNMS currently supports both mechanisms (Figure 2).

An OpenNMS architectural overview shows various components, such as the provisioning and poller daemons.
Figure 2: An OpenNMS architectural overview shows various components, such as the provisioning and poller daemons.

Because capabilities daemon-based detection is still defined as the default in version 1.8.x, the examples in this article rely on this mechanism. After identifying a service on a device, you are free to monitor this service. The poller daemon (pollerd) checks the availability of the service every five minutes by default. To allow the poller daemon to go about its work, you need to define the monitor configurations.

The SNMP Monitor

Because of the many possibilities SNMP offers, the SNMP monitor can be used as a kind of all-purpose weapon. Configurable parameters mean any value provided by the vendor can be queried and evaluated by a comparison function. Listing 1 shows a sample configuration for environmental monitoring with sensors provided by the vendor AKCP.

Listing 1: Environmental Monitoring Example

01 Service identification is controlled by /etc/opennms/capsd-configuration.xml:
02
03 <protocol-plugin protocol="AKCP-Temperature"
04  class-name="org.opennms.netmgt.capsd.plugins.SnmpPlugin" scan="on">
05         <property key="vbname" value=".1.3.6.1.4.1.3854.2.3.2.1.6" />
06         <property key="table" value="true" />
07         <property key="vbvalue" value="2" />
08         <property key="timeout" value="1000" />
09         <property key="retry" value="1" />
10 </protocol-plugin>
11
12 The monitor is set up by /etc/opennms/poller-configuration.xml:
13
14 ...
15 <service name="AKCP-Temperature" interval="300000" user-defined="false"
16  status="on">
17       <parameter key="retry" value="3"/>
18       <parameter key="timeout" value="3000"/>
19       <parameter key="port" value="161"/>
20       <parameter key="oid" value=".1.3.6.1.4.1.3854.2.3.2.1.6"/>
21       <parameter key="walk" value="true"/>
22       <parameter key="operator" value="="/>
23       <parameter key="operand" value="2"/>
24       <parameter key="match-all" value="true"/>
25       <parameter key="reason-template" value="A problem with AKCP Temperature
26        Environment detected. The state should be normal(${operand}) but actual
27        value is ${observedValue}. Syntax: noStatus(1), normal(2), highWarning(3),
28        highCritical(4), lowWarning(5), lowCritical(6), sensorError(7)"/>
29 </service>
30 ...
31 <monitor service="AKCP-Temperature"
32 class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor"/>

The monitor can be configured using the parameters oid, operator, operand, walk, matchAll, minimum, and maximum. The OID defines which value, or values, the monitor should retrieve. The walk parameter controls whether one or multiple values are required: a value of true means that this is a table.

Operands can be compared through the use of the operator parameter and various comparative operators (=, >=, <=, <,>, !=). You can use matchAll with true, false, or count options to determine whether all values, just one value, or a set number of values need to match in comparison.

If you use the matchAll parameter in combination with a value of count, you can specify how often a match has to occur by defining the minimum and maximum parameters. After configuring the monitors, restart the system by issuing the /etc/init.d/opennms restart command.

Windows Services Monitor

You can query the print queue status with the SNMP monitor using the OID .1.3.6.1.4.1.77.1.2.3.1.3.18.44.72.75.63.6b.77.61.72.74.65.73.63.68.6c.61.6e.67.65; however, it's easier to enter the service name print_queue, as shown in Listing 2. You need to define the service name both in the plugin and monitor as the service-name.

Listing 2: Print Queue Example

Service identification (/etc/opennms/capsd-configuration.xml):
01 <protocol-plugin protocol="MS-print_queue" class-name="org.opennms.netmgt.capsd.plugins.Win32ServicePlugin" scan="on">
02        <property key="timeout" value="2000" />
03        <property key="retry" value="1" />
04        <property key="service-name" value="print_queue" />
05 </protocol-plugin>
Setting up the monitor (/etc/opennms/poller-configuration.xml):
01 ...
02 <service name="MS-print_queue" interval="300000" user-defined="false"
03  status="on">
04       <parameter key="retry" value="2" />
05       <parameter key="timeout" value="3000" />
06       <parameter key="port" value="161" />
07       <parameter key="service-name" value="print_queue" />
08     </service>
09 ...
10 <monitor service="MS-print_queue"
11  class-name="org.opennms.netmgt.poller.monitors.Win32ServiceMonitor" />

The Process Monitor

Just like for the Windows service, system processes can also be monitored using SNMP regardless of whether you need to monitor a process on a Windows or Linux/Unix system. To keep the whole thing as simple as possible, the process monitor is just as easy to understand as the Windows services monitor. Again, you need the process name as a parameter (Listing 3).

Listing 3: OpenLDAP Process Example

Service identification (/etc/opennms/capsd-configuration.xml):
01 <protocol-plugin protocol="Proc_OpenLDAP" class-name="org.opennms.netmgt.capsd.plugins.HostResourceSwRunPlugin" scan="on">
02     <property key="timeout" value="2000" />
03     <property key="retry" value="1" />
04     <property key="service-name" value="slapd" />
05 </protocol-plugin>
Setting up the monitor (/etc/opennms/poller-configuration.xml):
01 ...
02 <service name="Proc_OpenLDAP " interval="300000" user-defined="false"
03  status="on">
04     <parameter key="retry" value="1"/>
05     <parameter key="timeout" value="3000"/>
06     <parameter key="service-name" value="slapd"/>
07 </service>
08 ...
09 <monitor service="Proc_OpenLDAP"
10  class-name="org.opennms.netmgt.poller.monitors.HostResourceSwRunMonitor"/>

The approach to setting up the plugin and the monitor is identical to that for the Windows service monitor. Again, all you need is the process name, which it is necessary to define both in the plugin and in the monitor as the service-name.

Debugging

The debugging of the individual daemons is handled by the /etc/opennms/log4j.properties file. If the service is not identified, you need to enable debugging for the capsd daemon. If the service is identified, but the monitor doesn't act as you expect it to, you need to enable debugging for the pollerd daemon. Note that it is not necessary to reboot the system when you enable debugging for either daemon.

If you perform a standard installation using your distribution's package manager, the log files will be stored below /var/log/opennms/daemon. The log files for capsd and pollerd are called capsd.log and pollerd.log, respectively. You can issue tail -f logfilename to debug the corresponding daemon.

Conclusions

Thanks to its excellent scalability and flexibility, OpenNMS can be deployed by corporations of any size. The software comes with many monitoring options, and it is extensible through mechanisms such as configurable monitors that require no programming skills. Also, the rule-based configuration minimizes administrative overhead because newly created services are identified, added, and monitored automatically by the system.