Features ReproducibleBuilds Lead image: Lead Image © jaroonrat vitoosuwan , 123RF.com
Lead Image © jaroonrat vitoosuwan , 123RF.com
 

Verifying packages with Debian's ReproducibleBuilds

Identical Build

Debian's ReproducibleBuilds project helps you determine whether a binary package was actually built from the associated source code. By Daniel Stender

Open source software offers a big security benefit: Unlike proprietary software, anyone can view the source code, so in theory you know what you are installing. However, the overwhelming majority of users install prebuilt software packages provided by their Linux distributors. These users rely on system developers and package maintainers to ensure that the binary packages do not contain malicious code that deviates from the official source code.

The Debian ReproducibleBuilds project helps you verify that the package matches the source code and that no flaws have been introduced (Figure 1) [1].

If the build system is compromised, the binary package produced by it in the ReproducibleBuilds system has a different hash value (red entry).
Figure 1: If the build system is compromised, the binary package produced by it in the ReproducibleBuilds system has a different hash value (red entry).

Attack Scenarios

As a popular Linux distribution, Debian distributes its own software to a large number of users worldwide. The customers are not only private users, but also organizations, research institutions, and companies. This complex and decentralized software distribution system creates opportunities for attackers to foist malicious code onto unsuspecting users.

One obvious attack scenario is targeted manipulation of a DEB binary package. Past exploits like the OpenSSH bug CVE-2002-0083 [2] show that sometimes changing just one bit is sufficient to install a backdoor [3]. For sophisticated attacks, attackers could dump a kernel rootkit on a package maintainer's computer, which would then secretly change the code at build time.

Secure Binary Packages

The idea of making binary packages for Debian reproducible has existed since 2007 [4]. At first, the idea was met with little response until decisive impetus came from projects with high security requirements. For example, the Tor Project has pushed the development of reproducible package building [5].

Since the Snowden revelations, significantly more users are interested in security gains offered by this approach. Bitcoin developers, for example, have a vested interest in safeguarding the money market software that distributes the virtual currency to users.

Matching Builds

If you want to identify manipulations by tracking different package build results, the first step is to ensure that the build process always produces identical packages. This is not the case in general. Instead, two binaries built from the same source code may often differ for several reasons [6].

For example, the packets generated here change when developers build on different machines. Or, the build process has a different timestamp in the header of the gzip archives it generates, or in the man pages created using the docbook-to-man tool. The program documentation contains timestamps, for example, as do HTML pages made by Doxygen [7] or PDF documents built with LaTeX [8].

Problems are also caused by different lists of files that are created because the POSIX readdir() function does not sort the output. A different build path, for example, results in changes to the build ID of the binaries. The locale used for the build also makes a difference to binary packages, as does the hostname of the build system, and many other factors.

Toolchain

Numerous triggers of these unpredictable deviations can be eliminated in the toolchain used for package building. Additionally, however, the developers have to edit the individual source packages with a view to improving reproducibility.

In certain cases, the package maintainer even needs to patch the software source code. This is the case, for example, if the developers use the __TIME__ or __DATE__ preprocessor macros [9] in the source code to additionally output the build date if the --version flag is set when calling the program.

Small things can play a decisive role when it comes to reproducibility. If you underestimate their importance, you can only build an identical package on another day by using a customized virtual environment. Having to rely on such virtual build environments is not the answer; it must be possible to reconstruct identical checksums for the binary packages under any conditions.

The ReproducibleBuilds working group therefore maintains an enhanced toolchain for deterministic package building [10]. This provides, for example, alternative packages for Doxygen and docbook-to-man, as well as the dh_strip_nondeterminism debhelper module, which, among other things, removes the timestamps from a series of objects. The toolchain is thus far purely experimental, but it is likely to be incorporated gradually into the official branch after the Debian 8 release.

Another innovation is the new .buildinfo check file. After building a package, it saves the build dependencies used with version numbers, the build path and the checksum of the source, and the generated binary package [11]. On the basis of this information, you can reconstruct the build path when rebuilding the package and understand the dependencies used in the snapshot archive package [12] if they are outdated.

The experimental toolchain already works in a chrooted build environment, for example, with pbuilder [13]. Package maintainers can thus use it for initial testing. The Debian Project also provides debbindiff [14], which can compare two packages built one after another.

Infrastructure

The project is now also docked onto Debian's infrastructure. The continuous integration platform [15] continuously checks the whole package archive to discover whether individual source packages can be reproducibly generated [16]. To do so, it builds the packages one after another with intentionally modified parameters and then compares the binary packages to discover differences.

If differences occur, the developers create bug reports for the package, which currently still has a priority of wishlist. Additionally, a set of predefined user tags is available for these bugs that categorizes the problem, for example, timestamps, fileordering, randomness, and so on. Not infrequently, the project developers also provide patches to the package maintainer.

Looking Forward

Fixed checksums for binary packages are just the start of optimizing security in Debian. In the future, package maintainers would no longer upload binary packages prebuilt on their computers but would instead send source packages along with the .buildinfo file.

If the package was always built on a Buildd machine [17], the checksums could be compared directly, and the project could reject packages when discrepancies occur [18]. Some time in the future, it might then be impossible to add non-reproducible packages to the official package archive. At the same time, checksum deviations in the Buildd network would be a reliable indicator of a manipulated build system. Sophisticated scenarios, such as Trusting Trust attacks [19], in which rogue build environments secretly propagate, could be effectively thwarted.

To monitor the integrity of the entire Debian system in this way, package reproducibility would need to work for all supported processor architectures. This is the next step of the project, but one in which unknown pitfalls might still be lurking. The goal of eventually building the entire package archive in a reproducible way thus may be too ambitious.

Some very subtle problems prevent this eventuality, including the build processes, which return different results depending on the time, CPU usage, and memory configuration. For example, GCC chooses hash functions to reflect the RAM size. Nevertheless, the project would like to get as close as possible to its target and is therefore also covering packages without program code, such as documentation.

If the associated bugs were given a priority of important or serious, maintainers would have to explain why they are not pursuing the long-term objective of assuring software integrity right down to the hardware level with test systems. However, some building blocks are still missing. For example, the project would be forced to accept only signed upstream tarballs from developers.

Conclusions

Thus far Debian's ReproducibleBuilds project is a success story: As of February 13, 2015, reproducible builds worked for 83.5 percent of all packages (Figure 2) [20]. The new build type will probably also be a release target for Debian 9 – all designed to make Debian that little bit more secure.

In February 2015, the number of packages that could be reproducibly built reached an interim high.
Figure 2: In February 2015, the number of packages that could be reproducibly built reached an interim high.