The Swiss Army knife of fault management
Flexible Monitoring
OpenNMS is known for its excellent scalability, but it is also characterized by its open interfaces and a variety of standard tools that can be used to monitor the status of services like HTTP, SMTP, POP3, or DNS immediately after installation. The system is extensible thanks to configurable monitors, so administrators can quickly cook up a large collection of monitoring options that don't require any programming skills from their users.
Monitors for SNMP and Windows services and processes that also use SNMP to retrieve information can be very useful. In this article, I'll explain how to use these monitors, taking into account that many administrators are unaware of the scope of options available.
SNMP Overview
The Simple Network Management Protocol (SNMP) was developed and standardized for monitoring and controlling network components. A Management Information Base (MIB) reveals the details of information that can be queried. This tree-like structure (Figure 1) stores all of the retrievable properties, and each property is accessible via an object identifier (OID).
In production, the Standard MIB II ranges (1.3.6.1.2.1.*) and the vendor-specific area below the enterprises branch (1.3.6.1.4.1.*) are relevant. The Standard MIB II is specified by IETF, and all vendors are supposed to implement it in a standardized manner. MIB II contains system information such as the SNMP device name, network interface parameters, and routing information. The vendor specific part is where you will find data such as hardware information, performance counters, or status information. Because vendor-independent standards exist for querying and setting a variety of values, SNMP has become a powerful network management protocol.
What SNMP Can Give You
In many cases, administrators want to monitor the state of hardware components such as hard disks, processors, memory chips, fans or temperature conditions. Often, administrators mistakenly think that only the vendor is capable of supplying this kind of hardware monitoring. But much hardware information can also be retrieved using SNMP. An SNMP monitor called OpenNMS relies on users to provide vendor-independent hardware monitoring. Popular management agents include:
- HP Insight Manager Agent
- Dell OpenManage
- ServerView Agent (Fujitsu Siemens)
- IBM Director Agent
Administrators often need to monitor Windows services and processes and, again, might not realize how useful SNMP can be in this context. The options really seem to be unlimited – from querying the status of the UPS, the air conditioning system, and sensor values, to monitoring and controlling complete systems.
Identifying Services
OpenNMS can automatically identify the services you need to monitor. In versions up to 1.8.x, this task is handled by the capabilities daemon (capsd) assisted by plugins. Version 1.8 introduces a new mechanism, which is integrated into the provisioning daemon (provisiond) as a foreign source. OpenNMS currently supports both mechanisms (Figure 2).
Because capabilities daemon-based detection is still defined as the default in version 1.8.x, the examples in this article rely on this mechanism. After identifying a service on a device, you are free to monitor this service. The poller daemon (pollerd) checks the availability of the service every five minutes by default. To allow the poller daemon to go about its work, you need to define the monitor configurations.
The SNMP Monitor
Because of the many possibilities SNMP offers, the SNMP monitor can be used as a kind of all-purpose weapon. Configurable parameters mean any value provided by the vendor can be queried and evaluated by a comparison function. Listing 1 shows a sample configuration for environmental monitoring with sensors provided by the vendor AKCP.
Listing 1: Environmental Monitoring Example
01 Service identification is controlled by /etc/opennms/capsd-configuration.xml: 02 03 <protocol-plugin protocol="AKCP-Temperature" 04 class-name="org.opennms.netmgt.capsd.plugins.SnmpPlugin" scan="on"> 05 <property key="vbname" value=".1.3.6.1.4.1.3854.2.3.2.1.6" /> 06 <property key="table" value="true" /> 07 <property key="vbvalue" value="2" /> 08 <property key="timeout" value="1000" /> 09 <property key="retry" value="1" /> 10 </protocol-plugin> 11 12 The monitor is set up by /etc/opennms/poller-configuration.xml: 13 14 ... 15 <service name="AKCP-Temperature" interval="300000" user-defined="false" 16 status="on"> 17 <parameter key="retry" value="3"/> 18 <parameter key="timeout" value="3000"/> 19 <parameter key="port" value="161"/> 20 <parameter key="oid" value=".1.3.6.1.4.1.3854.2.3.2.1.6"/> 21 <parameter key="walk" value="true"/> 22 <parameter key="operator" value="="/> 23 <parameter key="operand" value="2"/> 24 <parameter key="match-all" value="true"/> 25 <parameter key="reason-template" value="A problem with AKCP Temperature 26 Environment detected. The state should be normal(${operand}) but actual 27 value is ${observedValue}. Syntax: noStatus(1), normal(2), highWarning(3), 28 highCritical(4), lowWarning(5), lowCritical(6), sensorError(7)"/> 29 </service> 30 ... 31 <monitor service="AKCP-Temperature" 32 class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor"/>
The monitor can be configured using the parameters oid
, operator
, operand
, walk
, matchAll
, minimum
, and maximum
. The OID defines which value, or values, the monitor should retrieve. The walk
parameter controls whether one or multiple values are required: a value of true
means that this is a table.
Operands can be compared through the use of the operator
parameter and various comparative operators (=, >=, <=, <,>, !=). You can use matchAll
with true
, false
, or count
options to determine whether all values, just one value, or a set number of values need to match in comparison.
If you use the matchAll
parameter in combination with a value of count
, you can specify how often a match has to occur by defining the minimum
and maximum
parameters. After configuring the monitors, restart the system by issuing the /etc/init.d/opennms restart
command.
Windows Services Monitor
You can query the print queue status with the SNMP monitor using the OID .1.3.6.1.4.1.77.1.2.3.1.3.18.44.72.75.63.6b.77.61.72.74.65.73.63.68.6c.61.6e.67.65
; however, it's easier to enter the service name print_queue
, as shown in Listing 2. You need to define the service name both in the plugin and monitor as the service-name
.
Listing 2: Print Queue Example
Service identification (/etc/opennms/capsd-configuration.xml): 01 <protocol-plugin protocol="MS-print_queue" class-name="org.opennms.netmgt.capsd.plugins.Win32ServicePlugin" scan="on"> 02 <property key="timeout" value="2000" /> 03 <property key="retry" value="1" /> 04 <property key="service-name" value="print_queue" /> 05 </protocol-plugin> Setting up the monitor (/etc/opennms/poller-configuration.xml): 01 ... 02 <service name="MS-print_queue" interval="300000" user-defined="false" 03 status="on"> 04 <parameter key="retry" value="2" /> 05 <parameter key="timeout" value="3000" /> 06 <parameter key="port" value="161" /> 07 <parameter key="service-name" value="print_queue" /> 08 </service> 09 ... 10 <monitor service="MS-print_queue" 11 class-name="org.opennms.netmgt.poller.monitors.Win32ServiceMonitor" />
The Process Monitor
Just like for the Windows service, system processes can also be monitored using SNMP regardless of whether you need to monitor a process on a Windows or Linux/Unix system. To keep the whole thing as simple as possible, the process monitor is just as easy to understand as the Windows services monitor. Again, you need the process name as a parameter (Listing 3).
Listing 3: OpenLDAP Process Example
Service identification (/etc/opennms/capsd-configuration.xml): 01 <protocol-plugin protocol="Proc_OpenLDAP" class-name="org.opennms.netmgt.capsd.plugins.HostResourceSwRunPlugin" scan="on"> 02 <property key="timeout" value="2000" /> 03 <property key="retry" value="1" /> 04 <property key="service-name" value="slapd" /> 05 </protocol-plugin> Setting up the monitor (/etc/opennms/poller-configuration.xml): 01 ... 02 <service name="Proc_OpenLDAP " interval="300000" user-defined="false" 03 status="on"> 04 <parameter key="retry" value="1"/> 05 <parameter key="timeout" value="3000"/> 06 <parameter key="service-name" value="slapd"/> 07 </service> 08 ... 09 <monitor service="Proc_OpenLDAP" 10 class-name="org.opennms.netmgt.poller.monitors.HostResourceSwRunMonitor"/>
The approach to setting up the plugin and the monitor is identical to that for the Windows service monitor. Again, all you need is the process name, which it is necessary to define both in the plugin and in the monitor as the service-name
.
Debugging
The debugging of the individual daemons is handled by the /etc/opennms/log4j.properties
file. If the service is not identified, you need to enable debugging for the capsd daemon. If the service is identified, but the monitor doesn't act as you expect it to, you need to enable debugging for the pollerd daemon. Note that it is not necessary to reboot the system when you enable debugging for either daemon.
If you perform a standard installation using your distribution's package manager, the log files will be stored below /var/log/opennms/daemon
. The log files for capsd and pollerd are called capsd.log
and pollerd.log
, respectively. You can issue tail -f logfilename
to debug the corresponding daemon.
Conclusions
Thanks to its excellent scalability and flexibility, OpenNMS can be deployed by corporations of any size. The software comes with many monitoring options, and it is extensible through mechanisms such as configurable monitors that require no programming skills. Also, the rule-based configuration minimizes administrative overhead because newly created services are identified, added, and monitored automatically by the system.