Nuts and Bolts Monitoring Comparison 
 

Monitoring for small and medium-sized companies compared

UnderSurveillance

Monitoring is similar to backup: It is not a question of whether or not to monitor, but how. The solution to this problem is to have a good strategy with the right priorities and the right tools. By Jens-Christoph Brendel

The thing about any kind of tabular software comparison in the field of monitoring and elsewhere is that all of the more-or-less mature solutions support the most important features. It's unthinkable to compare every single feature, but if you just look at the critical features, there will be virtually no differences. A check mark would appear in 97 percent of the lines in the table. Leaving these results as is would be misleading; you might think that Icinga was basically the same as Tivoli on paper, for example – although the two programs use totally different approaches.

To find a solution to this dilemma, you have to leave the safety of simple tables with Yes/No boxes and run the risk of losing track. That is why this article attempts to find a compromise: We did a survey, but we asked for descriptive responses in free text format – some of which we decided to publish in unabbreviated form.

Additionally, we have tried to avoid comparing apples and oranges. For example, Tivoli, which is IBM's integrated solution for system and service management includes monitoring functions, among other things. The more modules you need from the Tivoli portfolio – such as asset, storage, security and agreement management, stock management, procurement, work management – the greater the potential benefit of integrating all of these areas under a single umbrella. On the other hand, if you compare the relatively limited subset of monitoring options with a product that exclusively offers similar features in just this one area, Tivoli's integration benefits disappear, and you are left with a construction that is both too complex and too expensive, and always two sizes too big. Thus, we ignored software of this caliber here.

We would have loved to cover some other products but were foiled by the vendor's lack of interest. For example, the German office of Quest Software, the manufacturer of Big Brother, purportedly forwarded our questionnaire to the marketing department, but the marketing department just headed for cover. When we asked about this, our questions remained unanswered – maybe because the vendor had lost faith in their own product after reading our questions … .

Unique Selling Point

Besides asking about individual features, we also asked about unique selling points: Why should the customer choose one solution rather than any other? The following are the answers we received.

OpenNMS emphasizes its scalability in particular: From environments with 200 systems in small to medium-sized enterprises through 70,000 interfaces in an enterprise environment, OpenNMS [1] scales without any problems, says the vendor. Preconfigured elements and open interfaces are another major benefit because standard monitors for critical services are thus available immediately after installing – for example, for HTTP, SMTP, POP3, or DNS. Thanks to its integrated SNMP trap and Syslog receiver, OpenNMS is capable of implementing centralized logging. Performance data can be collected via SNMP, WMI, HTTP, JMX, JDBC, or NSClient++.

The OpenNMS experts at NETHINKS, a certified OpenNMS partner and the host of the European OpenNMS User Conference, underlines this: "Another benefit is that OpenNMS can monitor the IT infrastructure superbly without using agents. Despite this, OpenNMS can, of course, integrate any programmed application via a scripting monitor. There is even a special monitor for the Nagios NRPE and NSClient++ agents in OpenNMS. All of this makes OpenNMS a professional, convenient, and secure network management system from a single source – from small to medium-sized businesses, through to the enterprise field – that covers the complete FCAPS (Fault, Configuration, Accounting, Performance, and Security Management) model without using plugins." OpenNMS is a high-performing, open source competitor to commercial enterprise products such as HP Openview or IBM Tivoli.

Zabbix also points to its high level of scalability: Up to 100,000 monitored devices with up to one million different metrics are no problem, according to the manufacturer [2]. Additionally, the software can process thousands of checks per second.

Another benefit that Zabbix offers is its ease of configuration thanks to a centralized database (Figure 1). Advanced features such as performance diagrams or maps in other solutions often mean a steeper learning curve before users can start producing meaningful results. Compared with Nagios, Zabbix also scores points with auto-discovery (like OpenNMS).

RRD charts like this one for CPU load are easy to configure in Zabbix.
Figure 1: RRD charts like this one for CPU load are easy to configure in Zabbix.

Icinga, the Nagios fork [3], is not shy about its features either. In comparison to the commercial Nagios variant, Icinga is 100 percent open source, say the makers. Thanks to its own database layer, it doesn't just support MySQL and PostgreSQL, but also the widespread Oracle. Icinga contains dozens of bugfixes that are missing in Nagios. The new, modern web interface is an attractive addition that goes well beyond the fairly staid-looking, legacy Nagios view. Icinga also offers a monitoring app for smartphones. Icinga is under ongoing development with fixed release cycles and a public roadmap; despite this, it remains downwardly compatible with Nagios and thus benefits from Nagios's best feature: its wealth of plugins.

Icinga integrates a powerful reporting features (based on Jasper Reports). In contrast to Nagios Core, it supports LDAP or Active Directory-based authentication. Icinga can handle IPv6 – which legacy Nagios cannot. Icinga supports more than 20 languages and has many other benefits to offer.

Nagios proudly points to its long tradition and widespread user base [4]. Inventor and main developer Ethan Galstad says: "Organizations trust Nagios because of its flexibility, long history, world-wide community, and wealth of free add-ons. This explains why there have been more than 3 million new Nagios installations worldwide in the past 12 months. This is why Nagios is the industrial standard for monitoring today. … Nagios has a much larger community than other projects. This guarantees companies more support, better documentation, and more add-ons."

Of course, in the case of Nagios, you have to distinguish between the free (in both senses of the word) Community Edition (Core) – which we will be mainly looking at in this article – and the commercial Nagios XI, which is released under a proprietary license. The latter gives the paying customer a variety of additional features, such as multiclient capability, various APIs for application integration, trend computations, and other capacity management data, as well as a GUI that adapts more easily to suit the customer's needs.

Tabelle 1: Basic Data

Name

OpenNMS

Zabbix

Icinga

Nagios

URL

http://www.opennms.org

http://www.zabbix.com

https://www.icinga.org

http://www.nagios.org

Version

1.8.x

1.8.6

1.6

3.3.1

License

GPLv2

GPLv2

GPLv2

Core: GPLv2; Nagios XI: commercial license

Supported platforms

Linux, Windows, Solaris, Mac OS X

Zabbix server and proxy: Linux, Solaris AIX, HP-UX, FreeBSD, OpenBSD, NetBSD, other Unix-like platforms; Zabbix Agent: additionally, Windows

Linux, Solaris, HP-UX, AIX, Gentoo, BSD, Mac OS X

Various Unix derivates, Linux, *BSD

Availability of packages/installers

Windows, Linux (DEB or RPM)

In the repositories of Debian, Ubuntu, Fedora; additionally, packages for openSUSE/SLES, RHEL, CentOS, Slackware

Ubuntu, Debian, Red Hat, CentOS, SLES, Solaris, Mac

Included in virtually any Linux distribution

First public release

29 March 2000

7 April 2001

27 September 2009

1999 (as NetSaint)

Price

Free license

Free license

Free license

Core, free; commercial, 50 nodes at US$ 1,300

Hardware Requirements

CPU:

1GHz

From 200MHz

ARM, Intel

The only requirement of running Nagios core is a machine running Linux (or Unix variant) that has network access and a C compiler installed (if installing from source code)

RAM:

512MB

From 16MB for the application

From 32MB (ARM CPU)

-

Disk Space:

8GB

From 32MB, depending on the volume of data logged

From 50MB

-

Software Requirements

Databases

PostgreSQL

MySQL, PostgreSQL, Oracle, SQLite, DB2

MySQL, PostgreSQL, Oracle

Optionally MySQL

Architecture

Basically agentless, but with open interfaces that support the implementation of agents. SNMP can be queried in versions V1, V2c, and V3. The Nagios agents NRPE and NSClient++ can be fully utilized

Agentless monitoring via various methods (TCP checks, ICMP Ping, SNMP, IPMI, others); additionally, native agent for all platforms

Icinga supports both agentless monitoring via TCP, SNMP, and WMI and the use of agents such as NRPE or NSClient++ for Windows systems

Basically agentless, if checks over the network and via SNMP are considered sufficient; additional checks via agents (NSClient++ on Windows, Nagios NRPE on Linux/Unix/*BSD), check_mk as an approach to collating individual checks (a request to the host, agent collects status information for all services and returns it to the Nagios server)

Scalability

Scalability is not restricted by the software architecture. Installations with 70,000 IP interfaces and 800,000 performance data items that are collected every five minutes exist. Individual daemons (pollers, data collection, etc.) can be offloaded onto separate hardware. Thanks to various performance-boosting technologies (e.g., caching), thousands of hosts per Zabbix server

To achieve good scalability, the various systems can be configured in distributed environments and controlled centrally via an interface. Additionally, the individual components such as the core, database, and web interface can be distributed to various systems.

A centralized configuration and remote control for remote systems is supported by add-ons and Icinga Web

Distributed installations of Nagios are possible for performance reasons but also to aggregate data from various Nagios installations centrally

Support

Commercial support available

Support by vendor and partners available

Many support variants available from service providers

Community support and various service providers from Nagios Inc. for the commercial version

Installation

As an administrator, you need to tell the monitoring software what to monitor, who to notify, and possibly when to do these things. Ease, speed, and confidence in fulfilling these tasks are major quality criteria, and the differences become apparent in large-scale environments, in particular. With some solutions, you need to store each host manually. If you have thousands of hosts, the effort involved is huge. Other solutions have built-in auto-discovery features that scan your networks and automatically add the computers it finds to your monitoring scenario.

OpenNMS comes with auto-discovery (Figure 2), but it takes things one step further – for example, a semi-automatic gathering system that checks to see whether a device exists in the database when it receives an SNMP trap or a Syslog message. If this is not the case, the system is integrated and a check is performed to determine which of the predefined services it provides.

Auto-discovery in OpenNMS has found a new node.
Figure 2: Auto-discovery in OpenNMS has found a new node.

The provisioning daemon also gives administrators the option of adding devices manually or via interfaces (HTTP, XML, DNS, and REST). The ability to add devices to the system via a web service (REST) or in XML format offers an excellent option for integrating existing CMDBs with OpenNMS. OpenNMS can also use SNMP to group IP interfaces to nodes. The IP interfaces are not identified as individual devices but as a device with multiple IP interfaces.

Zabbix has an auto-discovery feature that adds devices detected on the basis of their IP addresses or SNMP to the monitoring scope. Manual entry is also possible.

Icinga principally requires manual entry of systems to be monitored, like Nagios, but Icinga does let you install a Nagios plugin for network searches.

Nagios includes an auto-discovery component in the commercial version. The community edition can be upgraded just like Icinga, thanks to a plugin. Additionally, a program exists that lets you import results from the Nmap network scanner into Nagios.

Configuration

After adding the objects you need to monitor, it is important to define where and how they are stored. A centralized configuration is a must-have here, but it is also interesting to know whether the solution uses an easily searchable database or just plain-text files.

The web interface is also decisive for the usability of the configuration, although an objective comparison is difficult here.

OpenNMS stores its configuration in the form of XML files. Configuration changes are mainly made in the web interface. Direct editing of the XML files is possible, which is important for mass deployment.

Zabbix only stores the basic settings for the daemon in text files and stores any and all information relating to the monitored systems in a database. The candidates here are the usual suspects: MySQL, PostgreSQL, Oracle, SQLite2, and DB2.

Icinga stores configuration settings in classic ASCII files with a proprietary syntax. However, a workaround is available: A variety of add-ons, including NagiosQL, LConf, or Nconf, can create Nagios configuration files and these add-ons then access databases or an LDAP server.

Nagios also stores configuration settings in classic ASCII files with a proprietary syntax. The statement for Icinga holds true for Nagios.

Checks

The next question we asked was: What can you monitor? Again, the differences are not huge; all of the solutions monitor basic services out of the box.

Special requirements can be covered by installing a plugin in most cases. When it comes to detail, some solutions do a better job of this than others. OpenNMS, for example, includes the permanent status changes caused by some errors out of the box (flap detection), whereas Zabbix requires you to program the check yourself, and Nagios and Icinga need to install a plugin.

Tabelle 2: Checks

OpenNMS

Zabbix

Icinga

Nagios

Availability Checks

Services

Because OpenNMS uses SNMP to group IP interfaces, special differentiation options are available here: If a service is unreachable, OpenNMS signals a nodeLostService; if a critical service is unreachable, it distinguishes between interfaceDown and nodeDown. The latter is only tripped if the system is not reachable on any of its interfaces.

Yes, special feature: Instead of fixed limits, users can set up triggers that process various values, for example, by evaluating the relationship between the load average on the NFS server and the availability of the web server.

Yes, using the check_ping plugin

Yes

Hosts

Another of OpenNMS's benefits is the ability to configure adaptive polling. A service is checked every five minutes by default. In case of error, the interval is reduced to 30 seconds.

Yes

Yes, using the check_ping plugin

Yes

Performance Checks

CPU extent of utilization

Preconfigured

Built-in agent

Yes, using the check_cpu plugin

Yes

I/O

Preconfigured

Built-in agent

Yes, using the check_io plugin

Yes

Memory

Preconfigured

Built-in agent

Yes, using the check_mem plugin

Yes

Network

Preconfigured

Built-in agent

Yes, using thecheck_interfaces plugin

Yes

Automatic baselining and drift detection

Data collection has the ability to collect long-term values via SNMP, WMI, NSClient++, JMX, HTTP, and JDBC.

Can be implemented using triggers.

Provided by add-ons

No

SNMP checks

SNMP checks can be executed via the SNMP monitor. Whole tables can be processed. Users can differentiate as to whether one value in the table, every value, or a certain minimum or maximum number matches the reference value.

Yes (versions 1, 2c, 3)

Yes, using the check_snmp plugin and various SNMP plugins for the various manufacturers, such as Cisco, HP, Juniper.

Yes

Checks via Hardware Interfaces

SMART

Yes

Yes

Yes

Yes

IPMI / LOM / ILO

Yes

Yes, built-in

Yes, using the check_ipmi plugin

Yes, using plugins.

End-to-end checks

Currently there are two monitors for end-to-end measurement, the Mail Transport Monitor and the Page Sequence Monitor. The Mail Transport Monitor records how long it takes to transfer mail. The Page Sequence Monitor measures flows in web applications. Additionally, users can integrate Hyperic to ascertain and process values from an agent-based network management system.

User behavior can be emulated to a degree.

Various extensions support this. A tangible example is the AUTO-IT automation tool and its integration with Icinga using check_autoit. This supports testing of complex client workflows. Silenium and Cucumber are supported for web workflow automation.

Can be implemented via distributed Nagios installations if needed.

Flap detection (permanent state changes)

Flapping is detected by the Vacuum Daemon. It generates an event that OpenNMS logs.

Yes, can be emulated using triggers.

Yes

Yes

Anomaly detection

OpenNMS offers integration options for anomaly detection systems. Additionally, NETHINKS GmbH, in cooperation with Fulda University, is developing secMONET, a research project designed to fulfill precisely this requirement.

Yes, can be emulated using triggers.

Relies on baselining.

None that go beyond flap detection.

Notifications

When an issue is detected, the user needs to know about it. Although this sounds simple, it is actually far more complex. After all, there are numerous exceptions and special cases. For example, you don't want to receive notifications during planned downtime, and notifications at night should only be for genuine emergencies above a certain level of severity. The system shouldn't tell the boss until the matter escalates or a member of staff fails to confirm that they are working on the problem. Different rules should apply on Sundays and public holidays. In some cases, you need to call a single person and, in other cases, a whole group. One person is best reached via email, a second person by text message, and a third person by phone or pager. Once the root cause of the problem has been identified, any subsequent errors shouldn't lead to more alerts. And so on.

OpenNMS can send notifications to users and groups and to predefined roles. Whole chains of messages can be established for different persons, groups, and roles using a variety of media such as text, instant messaging, or email.

The chains can be interrupted by a confirmation, as needed. Within planned downtime, polling, notifications, thresholding, and data collection can be disabled. To suppress subsequent errors, path outages can be configured to suppress alerts for dependent nodes.

Zabbix has a whole arsenal of notification capabilities. It provides repeated notifications, any number of escalation levels, and automatic notification up to problem resolution in case of success. Thanks to its triggers, for which you can also define dependencies, Zabbix also lets you support subsequent errors – even in multiple stages if needed. A local host might depend on a switch on its home site, and at the same time on a remote router and a centralized router; it would not be reported as down if one of the superordinate components was the actual culprit.

Icinga has nothing to hide in the notification types comparison; it also supports multiple stage escalation and it can suppress derived errors, as long as they have been confirmed. One of Icinga's special features is voice mail; the messages can even contain menus that let the receiver repeat the message or confirm receipt remotely via the keypad. Icinga can also distinguish between non-availability and an actual failure thanks to predefined parent/child relationships.

Nagios, as a general rule, offers the same notification features as its competitors, with flexible notification and escalation services via a variety of methods to individuals or groups.

User Interfaces

The user interface – typically a web interface – is definitive for usability. Nagios is the classic role model: Many successors and emulators adopted the traffic-light display later on, but the design looks fairly antiquated today, although you can add a couple of plugins to improve things. Administrators now expect a dashboard with freely locatable elements, visualization of timescales in charts, embedded maps, and floor plans or charts that illustrate effects on business processes.

In this field, Icinga in particular sets itself apart with an innovative concept (Figure 3). Icinga uses "cronks," or bits of code (PHP, HTML, or JavaScript), that extend the GUI. As a result, the Icinga web interface is innovative and full of practical solutions. For example, users can group all of the columns in the status overviews to meet their needs and sort by any column; there are also many filter and search options. Additionally, you can submit individual commands to a selection of hosts in one fell swoop. Administrators can preconfigure views or pass them on to others – also as read-only views, thanks to the flexible authorization concept.

Icinga, shown here with the Business Process View add-on, offers a state-of-the-art and innovative web user interface.
Figure 3: Icinga, shown here with the Business Process View add-on, offers a state-of-the-art and innovative web user interface.

OpenNMS is much closer to Icinga than Nagios, thanks to its support for user-specific views and a granular user authorization concept. Another similarity between OpenNMS and Icinga relates to the preconfigured reports, which will definitely impress your management.

Things don't look quite this good in the Zabbix camp, but at least it includes RRD charts and maps out of the box.

Performance and Scalability

Performance is discussed in a separate article in this issue, and I will not be looking at it in any great depth here. As a general rule, performance is not an issue in small to medium-sized environments but can be in larger environments. In some cases, time spent reading checks and processing return values is multiplied by big numbers, which can lead to the system failing to complete a check before the next cycle is due. This, in turn, creates a backlog that the system will never be able to cope with and which will bring the monitoring system to its knees.

The magic formula for performance problems is distributing the load. If a single server is overloaded, administrators can solve the problem by spreading the load across multiple machines. Although Nagios doesn't support load balancing out of the box, a number of add-ons let you add this capability. OpenNMS can run individual daemons on different servers. Icinga also does a good job of this with its Icinga Web, which lets you offload the core, the database, and the web interface to various systems. Zabbix also supports distributed monitoring if you have a large number of nodes to keep an eye on.

Conclusions

There is no hard-and-fast rule for finding the best monitoring solution in any given scenario, but one thing is definitely advisable: Experiment! All of these solutions are relatively easy to install; you will even find prebuilt virtual machines for some of them, such as the Icinga Starter Kit [5] or the Zabbix appliance in VMware [6].

Otherwise, installing from the packages or source code isn't likely to be a problem. You might experience some difficulty with Zabbix, because it requires PHP with a built-in bcmath module. If your distribution doesn't include this (as in the case with SLES), you would need to uninstall PHP, rebuild from the sources with the required modules, and then start the Zabbix install from scratch – a somewhat convoluted process.