Nuts and Bolts Performance Tuning Dojo 
 

Exploring the filesystem that knows everything

One /proc to Rule Them All

Nearly everything you need to know about your system is stored somewhere in the /proc filesystem. By Federico Lucifredi

The /proc filesystem [1] is one of the most original results of the Unix world's bias for seeing everything as a filesystem. At its most essential, procfs is a mechanism for exposing the state and configuration of the computer through a virtual filesystem. Files in /proc provide access to most interesting details about a system's operational state, and when those files can be directly modified, they even allow you to change the configuration.

What's In /proc?

The name proc is shorthand for Process Filesystem, and indeed the original SVR8 Unix implementation is documented in Tom J. Killian's 1984 Usenix paper [2] entitled "Processes as Files." Cross-pollinated through the later Bell Labs Plan 9 implementation, Linux's version is original in exposing not just process information, but a wealth of system details as well [3]. The files in the Linux /proc directory also have a pleasingly hackable penchant for being directly readable as plain text, as opposed to more binary-centric proc implementations that rely on tools to expose the raw data to end users.

The main highlights of the Linux version of proc are listed in Table 1. Each process subdirectory contains files exposing this information, and more. A wealth of details about your processes is available, although security stops you from accessing other users' processes if you are not root. The Linux kernel also provides lots of additional system information through proc, something that makes Unix purists cringe, but users have come to love these additional details. For example, the cpuinfo file contains a lot of CPU details (see Listing 1).

Tabelle 1: Per-Process Data in /proc/pid

cmdline

Complete command line for the process, null-separated.

cwd

Symbolic link to the process' current working directory.

environ

The process's environment, null-separated.

exe

Symbolic link containing the pathname of the executed program.

fd

Subdirectory containing one entry for each file currently opened by the process.

fdinfo

Similar to the previous, exposes information about the state of the open files.

limits

The process resource limits, see getrlimit(2) and ulimit(1).

maps

Memory-mapped memory regions.

mem

This file can be used to access the process's memory through open(2), read(2), etc.

mountinfo

Information about mountpoints.

mounts

Filesystems currently mounted in the process's mount namespace.

mountstats

Statistics and configuration information about the mountpoints.

oom_score

Current score the OOM killer gives the process.

oom_adj

Used to adjust the oom_score.

root

Root of the filesystem from the process's point of view. See chroot(2).

smaps

Memory consumption for each process mapping.

stat

Status information about the process.

statm

Memory usage, measured in pages.

status

Information in stat and statm in a more readable format.

task

Directory with a subdirectory for each thread in the process.

Listing 1: Viewing CPU Details with cpuinfo

01 federico@Skyplex:~/Desktop$ cat /proc/cpuinfo
02 processor       : 0
03 vendor_id       : GenuineIntel
04 cpu family      : 6
05 model           : 23
06 model name      : Intel(R) Core(TM)2 Duo CPU   L9400  @ 1.86GHz
07 stepping        : 6
08 microcode       : 0x60c
09 cpu MHz         : 800.000
10 cache size      : 6144 KB
11 physical id     : 0
12 siblings        : 2
13 core id         : 0
14 cpu cores       : 2
15 apicid          : 0
16 initial apicid  : 0
17 fdiv_bug        : no
18 hlt_bug         : no
19 f00f_bug        : no
20 coma_bug        : no
21 fpu             : yes
22 fpu_exception   : yes
23 cpuid level     : 10
24 wp              : yes
25 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep
26 mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2
27 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf
28 pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm
29 sse4_1 lahf_lm ida dtherm tpr_shadow vnmi flexpriority
30 bogomips        : 3724.23
31 clflush size    : 64
32 cache_alignment : 64
33 address sizes   : 36 bits physical, 48 bits virtual
34 power management:
35 ...

Listing 1 just shows the output for the first of two cores on my system; the file continues on covering the second core. The system is an Intel Core 2 Duo CPU capable of 1.86GHz, but it is apparently running at just 800MHz to save power. The flags tell me that this computer has hardware virtualization support (the vmx flag), but also that this system instance is not itself a virtual machine (the hypervisor flag is absent). See the list of available CPU features in the cpufeature.h file [4]. Nearly all non-tracing system performance tools rely on metrics exposed by proc. For example:

cat /proc/cpuinfo

and

lscpu

and

sudo lshw -c cpu

are all fine ways to determine your processor's clock speed (Figure 1). However, using strace(1) to trace the program's system calls [5] will quickly reveal that, in all these cases, the original source of the information was actually the proc filesystem (Figure 2).

Checking the processor's clock speed.
Figure 1: Checking the processor's clock speed.
Standard systems tools rely on /proc for system information.
Figure 2: Standard systems tools rely on /proc for system information.

Conclusion

The proc filesystem is an extensive subject, and one that could be documented better in the kernel's Documentation/ subdirectory, but the LXR project remains a handy reference when source documentation fails. Table 2 shows an overview of the major system-centric areas of the proc filesystem. Understanding the origin of the data being consumed by your Linux configuration tools is a powerful way to understand when expectations and reality diverge, and this enables you to write your own new custom tools when the existing ones fall short of your needs – it is the Unix way!

Tabelle 2: /proc/ and its Subdirectories

Area

Description

apm

Advanced Power Management version and battery information (CONFIG_APM option at build time).

bus

Subdirectories for installed buses.

bus/pci

PCI buses, devices, and device drivers. Some of the tree is not in ASCII.

bus/pci/devices

Information about PCI devices – see lspci(8) for details.

cmdline

Parameters passed to the kernel at boot time.

config.gz

Configuration that was used to build the presently running kernel – search with zcat(1).

cpuinfo

CPU and architecture information. Highly dependent on system architecture.

devices

Major device numbers and groups.

diskstats

Disk I/O statistics for each disk device.

dma

Registered DMA channels in use.

filesystems

Text-list of filesystems supported by the running kernel.

ide

Subdirectories for each IDE channel and attached device.

interrupts

Lists the number of interrupt events per CPU or I/O device.

ioports

Currently registered input/output ports regions that are in use.

kcore

The system's physical memory presented as an ELF core file.

loadavg

Load average figures.

locks

Current file locks.

meminfo

Statistics about the system's memory usage.

modules

Information on currently loaded kernel modules.

net

Status of the networking subsystem – better interpreted through netstat(8).

net/arp

The kernel's ARP mapping table.

net/dev

Network device status information.

net/tcp

Dump of the TCP socket table. Similar entries exist for UDP.

slabinfo

Information about kernel slab caches.

stat

Kernel and system statistics.

swap

Swap areas in use – see swapon(8).

uptime

The systems uptime and the time spent idling (in seconds).

version

Running kernel version, similar to uname(1) but more exact to identify a build.

vmstat

Virtual memory statistics.