Features SMB Traffic Analyzer Lead image: Qi Feng, 123RF.com

Visualize data throughput with SMBTA

Traffic Report

The SMB Traffic Analyzer is a VFS module that supports real-time analysis of data throughput on the SMB network. It includes analysis tools but also can use the RRDtool for visualization. By Thomas Drilling

The SMB Traffic Analyzer (SMBTA) tool is implemented as a VFS module that allows a Samba/CIFS server to record traffic statistics on a Samba network. The SMBTA daemon stores this information in a database and makes it available via SQL, for example.

The architecture is clear cut and comprises a module in the Samba VFS (Virtual File System), a daemon, and a set of client tools (smbtatools) for evaluation and visualization. SMBTA uses an existing SQL database (sqlite3) to store the traffic data.

The developer and inventor of SMBTA, Novell's Holger Hetterich, has been working on his Samba Traffic Analyzer since the SambaXP Conference in 2007. His work has attracted attention in expert circles – including the Samba team – which has led to Hetterich's holding well-attended keynotes at Samba Xperience and other events.

In the meantime, Novell has supported this work by allowing Hetterich to dedicate some of his working hours to the development of SMBTA. Unfortunately, the work hasn't aroused much general interest, mainly because feedback of all kinds has been handled directly by Hetterich and not via mailing lists or the usual channels; however, this is just a matter of public relations. Although right now, the number of users is fairly low, that's likely to change when version 3.6 of Samba, which will include SMBTA as an official component, is released [1]. Although experienced admins would have no trouble sniffing typical NetBIOS traffic by pointing a port sniffer at the usual suspects, SMBTA focuses entirely on Samba traffic. This approach allows administrators to create comprehensive and meaningful statistics because the tool performs genuine data mining centering on the SQLite database. For example, SMBTA will let you target an individual user, share, or file for statistical analysis.

The client side has basically three evaluation tools, all of which are part of the smbtatools package. An RRD (Round-Robin Database) driver and RRDtool [2] can be linked to the SMBTA daemon in real time – thus supporting visualization of the traffic data or ongoing processing by means of, say, Perl scripts – or with the use of IP or Unix domain sockets.

The smbtaquery tool can use XML to read the database. If you prefer a more visual approach, smbtamonitor also supports access to the stored data without the need to speak SQL.

Getting SMBTA

To try out SMBTA now, you can either download the source code for the current version, 1.1.2 [3], or go for Holger Hetterich's blog binaries for openSUSE [4]. General information on SMBTA is also available [5].

Because installing SMBTA means installing backports for Samba 3.6, users are advised to go for the RPM binary or the One-Click Installer on openSUSE 11.3.

Of course, you can also use Git to check out SMBTA [6]. And, last but not least, SUSE users can add the package source [7] in YaST and then use the package manager to install SMBTA (Figures 1 and 2). An even easier approach to trying out SMBTA is to use Hetterich's "Stresstest" appliance, which is currently available as version 0.0.2 [8] (see the "Stresstest" box).

Figure 2: The SMBTA daemon is relatively easy to install from the source code, from the repository, or by using the binary with YaST.

Stresstest

Stresstest (currently version 0.0.2) is a prebuilt SUSE appliance with the Samba server including the latest SMBTA VFS module (Figure 1). Although Stresstest is based on SMBTA 1.2.2, it also contains a large number of patches that are not included in 1.2.2.

The Stresstest appliance is primarily designed for testing SMBTA and is used intensively for this purpose by developers. If you want a sneak preview of SMBTA's analysis qualities without setting up a complex scenario, Stresstest is the best solution. The Open Virtualization Format (OVF) appliance also includes smbtatorturesrv, which is a small server application that uses multiple process instances to distribute filenames and paths across the test environment.

In Stresstest 0.0.2, six active user accounts all use the smbtatorture application to keep the server process busy. The smbtatorture tool itself is a small test suite for SMBTA and has thus mainly been used by the developers themselves for long-term testing. The tool simulates the typical load behavior of office applications. Between individual traffic production cycles, the tool takes breaks of several seconds. It also measures its own running time and can record and reproduce its own activity. Multiple smbtatorture processors can run in parallel without any problems.

Note that SMBTA Stresstest uses port 3491, which the smbtatools also used to handle requests; you'll need to take this into consideration when configuring your firewall or packet filter. Otherwise, the appliance is configured as follows:

Network: DHCP
Timezone: Europe/Berlin
Language: de_DE.UTF-8
Firewall: disabled

The root password and the password for the users, holger, nelson, john, bjoern, and btram, is linux.

Installing SMBTA and smbtatools

If you would like to test SMBTA as quickly as possible but prefer not to install a dedicated Samba server in the process, you'll need to build SMBTA and the smbtatools from the source code (Figure 2). To allow this to happen, make sure that you have cmake, libsmbclient-devel, libtalloc-devel, and ncures-devel in place on your computer.

Additionally, you will need the SQLite3 database environment and the corresponding developer packages. Also make sure that libxslt is installed; it should be on openSUSE by default.

Now, you can unpack the source code, smbtatools-1.2.2.tar.bz2, and change to the resulting directory. Give the cmake command in the build directory to configure the package for the build process:

cmake ../smbtatools-1.2.2

Next, make and make install the compiler package and copy the programs to the correct location.

To start the daemon, type

smtad -u -n

which tells the daemon (u) and the client (n) to communicate via Unix domain sockets. The SMBTA daemon's main task is to feed data to the SQL database; it receives data from the VFS module.

At the same time, the daemon is also responsible for handling client requests to the database. Suppose you want to record all the traffic that occurs on a selected share; in this case, you would simply load the required VFS into the share definition:

vfs objects = smb_traffic_analyzer
smb_traffic_analyzer:protocol_version = v2
smb_traffic_analyzer:mode = unix_domain_socket

You can configure the final parameter here to suit your own needs (Figure 3). For example, if you prefer to use TCP/IP communications, the matching share definition would look like:

vfs objects = smb_traffic_analyzer
smb_traffic_analyzer:protocol_version = v2
smb_traffic_analyzer:host = localhost
smb_traffic_analyzer:port = 3490

Figure 3: SUSE users can set the required Samba variables in YaST.

In this case, you would launch the SMBTAD daemon as follows:

smbtad -i 3490 -p 3491

The daemon waits for requests to the VFS module on port 3490 and handles client requests on port 3491 (default). By default, smbtad creates its SQLite database in $HOME/.smbtad/staddb, unless this database already exists.

For an overview of the many other parameters, you can type:

smbtad -help

For an exhaustive explanation of all parameters, I recommend reading the excellent documentation, or you can store all the configuration parameters you need in an /etc/smbtad.conf file, which uses a typical INI file format and # as a comment character.

RRDtool

RRDtool, which was originally developed by Tobias Oetiker, has become the quasi-standard tool for storing network monitoring data. It allows administrators to store and visualize acquired data over time. RRDtool is now released under the GNU General Public License, and many developers contribute to it. RRDtool is available in source code or binary format for several operating systems.

RRD stands for Round-Robin Database and is thus indicative of how the tool stores its data. When it creates a database, the system allocates only enough disk space for a certain period of time and does not expand the database after this time; instead, it overwrites the oldest data. This method is commonly referred to as round robin in computer science.

The idea behind this approach is that the permanent stream of acquired data arriving at fixed intervals will not fill up the hard disk if left to its own devices. In most cases, a higher level view is sufficient for storing data, whereas current events typically are revealed by inspecting the important details.

The RRDtool user interface comprises a set of command-line tools whose workings are explained in great detail on the project website. Additionally, APIs exist for many programming languages, especially C and Perl, which means developers can call RRDtool in their own programs to store data. RRDtool is not typically executed at the command line; it serves other programs as a data source, data store, or both, examples being Cacti [9] or MRTG [10]. The comprehensive list is available from the RRDtool website.

Using smbtaquery

As mentioned earlier, the available client tools include an RRD driver, the smbtaquery command-line tool, and smbtamonitor. Experienced administrators can use the SQL tools of their choice to inspect the internals of the traffic database. Having said this, smbtaquery does provide a more convenient approach – being specially designed for the SMBTA database setup and including a number of preconfigured queries. The smbtaquery tool creates the required XML output, which you can then convert to your favorite format, assuming you have an XSLT processor installed.

The XSLT processor draws on stylesheet information from smbtaquery. To facilitate queries to the database, smbtaquery includes a simple interpreter that is customized for cooperation with SMBTA. You have two options for using the internal interpreter. One option is to pass in a file that contains all of the query instructions. The filename is specified by the -f (file) parameter:

smbtaquery -h Host -i 3491 -f commandfile.txt

In profile and interpreter mode, each command must be separated by a comma and parameters by space characters. Each line ends with a semicolon. In the usual style, the configuration file uses # for comments.

Alternatively, you can use the internal interpreter directly. To do so, pass the -q (query) parameter to smbtaquery:

smbtaquery -h Host -i 3491 -q 'Query'

The -h and -i parameters here have the same meaning as previously. The -q for "query" sequence is followed by the query syntax,

smbtaquery -h Host -i 3491 -q 'global, usage rw;'

where smbtaquery counts the global traffic on the complete Samba network. Of course, you could just as easily restrict traffic to be investigated to a single user or share:

smbtaquery -h Host -i 3491 -q 'user drilling, total w;'

Besides using hostnames and TCP ports, smbtaquery can also use Unix domain sockets for its bindings.

smbtaquery -u -q 'global, usage rw;'

Incidentally, smbtaquery sends all its output to the terminal on which it was started by default. Standard Unix operators redirect the output, as in > output.txt, or the -o parameter creates HTML-formatted output:

smbtaquery -u -q 'global, usage rw;' -o html > output.html

The excellent documentation has many examples of queries, such as,

'user drilling, total r;'

which evaluates the total number of bytes read by user drilling on the Samba network (r). Or,

'share USB-Fritzbox, usage rw;'

the usage function, shows a timeline for read/write access to the USB-Fritzbox share (an external hard disk connected to a router) over the virtual day divided into periods of 24 hours (Figure 4). Alternatively,

'share USB-Fritzbox, total w;'

Figure 4: The usage parameter supports weighted evaluation of an activity, for example, at a particular share over time.

discovers the absolute number of bytes written on the USB-Fritzbox share. The total function determines and shows the number of bytes read from/written to the specified object (share, user, total network).

Other powerful and interesting parameters, such as top, list, or last_activity, are detailed in the documentation.

Using smbtamonitor

The smbtamonitor tool lets administrators monitor all Samba traffic in real time. To do so, the client opens a direct connection to the smbtad daemon instead of picking up stored traffic information from the database. The daemon sends all the data packets received by the VFS module to smbtamonitor, which in turn listens permanently at the SMBTA socket and visualizes all received packets in a Curses graph.

To allow this to happen, each smbtamonitor instance binds to an object (user, share, or file – Figure 5). The admin can start as many smbtamonitor instances as needed.

Figure 5: Smbtamonitor binds specifically to an object – a share, user, or file name.

With this approach, you can, for example, use smbtamonitor to visualize the absolute number of bytes transmitted and/or data throughput per second for the object in question. The smbtamonitor tool needs you either to specify the host (-h) and port number (-i) or to initiate a connection by means of Unix domain sockets with the -n parameter:

smbtamonitor -h Host
 -i 3491 --share Release

The smbtamonitor tool can bind explicitly to a file. For example, you can visualize in real time if and to what extent users on the network have noticed a file with the name RELEASENOTE.TXT:

smbtamonitor -h Host -i 3491 --file RELEASENOTE.TXT

Incidentally, smbtamonitor can also use a $HOME/.smbtatools/monitor-config configuration file, which specifies the hostname and port number:

[network]
Hostname = SMBTA-Host
Port = 3491

Many other useful parameters are detailed in the documentation.

rrddriver

The RRDtool is renowned for its robustness and ease of use; it supports, for example, visualization of the throughput on a Samba share. RRDtool is available from the project website [2] and from the openSUSE repositories. It is easily installed with YaST (Figure 6).

Figure 6: The RRDtool can be installed using YaST on openSUSE.

SMBTA itself contains the driver as an interface to RRDtool; the driver is called by the rrddriver keyword followed by one or more arguments (Figure 7). The arguments include the well-known -h (host), -i (TCP Port), -n (domain sockets), -s (share), and -u (user), as well as -r, used to define an RRDtool setup string. The default is:

DS:readwrite:GAUGE:10:U:U
DS:read:GAUGE:10:U:U
DS:write:GAUGE:10:U:U

Figure 7: The RRD driver implements an interface between SMBTA and RRDtool, which can then be used to create comprehensive graphs of time series data.

An example of the use of rrddriver would look something like:

rrddriver -b meinrrd -h Host -i 3491 -user drilling

This command launches the RRD driver for all traffic information produced by user drilling. By default, RRDtool updates its database every 10 seconds, or you can reduce the interval to, say, two seconds with -S 2. You can then use the rich feature set of RRDtool. For more information, see the "RRDtool" box. The project website also provides comprehensive documentation. An example of a manual call to RRDtool is provided by Listing 1.

Listing 1: Manual Call to RRDtool

01 rrdtool graph fig-smb-throughput.png -s 1290772099 -S 1 --title "Data throughput on share 'johnsfiles'" DEF:read_in=testdb:read:AVERAGE DEF:write_in=testdb:write:AVERAGE "AREA:write_in#AA0000:Write" "STACK:read_in#AA9999:Read"

This call creates a PNG image – fig-smb-throughput.png – with the title Data throughput on share 'johnsfiles', which shows the read and write throughput on the specified share, triggered in Unix time format as specified by the -s parameter.

Conclusions

Measuring and visualizing data throughput on a network is not a difficult task. But, if you are specifically interested in data traffic created by the Samba CIFS server, a bona fide data-mining tool like SMBTA might be just what you need.

In any case, the Samba team has decided to incorporate SMBTA as a component of Samba 3.6. However, taking the current Samba architecture and, more specifically, the SMBTA architecture into account, the danger exists that the database used by SMBTA will grow very quickly.

As of this writing, I've not been able to discover how enterprise environments are planning to cope with this problem.