Nuts and Bolts File Encryption Lead image: Lead Image © Yoichi Shimizu, 123RF.com
Lead Image © Yoichi Shimizu, 123RF.com
 

Encrypting files

Safe Files

Encrypting your data is becoming increasingly important, but you don't always have to use an encrypted filesystem. Sometimes just encrypting files is enough. By Jeff Layton

The revelations of Edward Snowden caused a big upsurge in the use of encryption for protecting data from inappropriate access. People are now using encrypted filesystems as well as self-encrypting devices (SEDs). However, not everyone is using encryption.

Recent revelations about accessing the data of individuals include the story about how the NSA and Britain's Government Communications Headquarters (GHCQ) supposedly gained access to SIM cards [1] from Gemalto, allowing them to access any cell phone communications that used these cards. Another story talks about how Lenovo installed malware [2] on its laptops that allows the software to steal web traffic using man-in-the-middle attacks.

When you use an encrypted filesystem or SEDs [3], all of the data is encrypted. However, if you forget the password, you lose all of the data on the filesystem or drive. It may be easier to encrypt files individually so that if you forget the password, you only lose a single file and not the entire filesystem or drive. Moreover, you might be casually copying your files to the cloud or other backup systems from your desktop, laptop, or cellphone. If you do not encrypt these files yourself, more likely than not, these files are not encrypted.

Using simple tools to encrypt files individually and then copy them to your backup is an easy process. As previously mentioned, by encrypting the files individually, if you forget the password, then theoretically you will lose only a single file (unless you use the same passphrase for all files, in which case you might lose access to all data).

Before you read the rest of this article, note that I'm not a security or cryptography expert, nor do I play one on TV. Please do your own research. That said, in the sections below, I review a few file encryption/decryption tools and finish with some personal recommendations on using them.

GPG

To start, I'll look at probably the most popular encryption tool, GNU Privacy Guard (GPG) [4]. The tool has become popular because it's fast, the encryption is very good if used correctly, the code is open source, and it follows the OpenPGP specification [5], which is also an IETF standard [6]. GPG was really designed as a command-line encryption tool for files but has been incorporated into email tools for encrypting email.

GPG uses a hybrid encryption approach [7] that combines two methods: symmetric-key encryption and public-key cryptography. Symmetric-key encryption/decryption means that both the sender and the receiver share the same key. Typically, symmetric-key encryption is used for speed and public-key cryptology is used because of easy secure key exchange.

As mentioned, GPG can be used for encrypting messages such as email. To do this, GPG uses asymmetric key-pairs that are individually generated for each user. From this key pair, you can exchange the public keys with other users using Internet key servers or something similar, allowing them to decrypt the email you have sent.

A variety of encryption options are available with GPG. By default, it uses the symmetric encryption algorithm, CAST5 [8], which is a 128-bit symmetric-key block cipher with a 64-bit block size and key size between 30 and 128 bits (Table 1).

Tabelle 1: GPG Encryption Options

Public key

RSA

EIGamal

DSA

Cipher

IDEA

3DES

CAST5

Blowfish

AES-128/-192/-256

Twofish

Camellia-128/-192/-256

Hash

MD5

SHA-1

RIPEMD-160

SHA-256/-384/-512/-224

Compression

Zip

ZLIB

BZIP2

For AES, GPG always uses block sizes of 128 bits and a varying key length of 128, 192, and 256 bits, whereas Blowfish uses a block size of 64 bits and a key length from 32 to 448 bits. For some cipher algorithms, such as AES-256, the number indicates the length of the hash key used in the algorithm.

A general rule of thumb is that the larger the hash key, the more "protected" your data will be (if your passphrase is sufficiently long). However, it also means that it takes more resources, such as CPU, memory, and time, to encrypt the file. If you want to encrypt the file and very rarely decrypt it, you might want to use an algorithm with a very long hash key.

If you're going to be decrypting the file fairly often, then you might want to try a shorter key to improve encryption/decryption time at the expense of somewhat "weaker" encryption. Ultimately, the choice is yours, but personally I like to encrypt my data with a very long cipher key (almost as large as I can get). According to the Evil 32 website [9], using modern GPUs, 32-bit key IDs can be decoded. They say that it only takes four seconds to generate a colliding 32-bit key ID on a GPU. In fact, they claim that they found collisions for every 32-bit key ID in the Web of Trust (WOT) [10] strong set. Breaking the 32-bit key ID doesn't compromise GPG's encryption according to the site, but "… it further erodes the usability of GPG and increases the chance of human error."

Key IDs are not typically used in encrypting data, but you should definitely be aware of them, particularly if you use GPG in everyday use. Therefore, the researchers highly recommend using 64-bit key IDs.

Using GPG is very easy. You begin with a file and use gpg to encrypt it with the -c option, which uses a symmetric key as well as the default CAST5 cipher. The example in Listing 1 encrypts the text file hpc_001.html. Notice that the gpg command leaves the original file in place and creates a new file with a .gpg extension. Also notice that encrypting a simple text file produced a much smaller encrypted file than the plain text original.

Listing 1: Encrypt a File

$ ls -s
total 11228
11032 Flying_Beyond_the_Stall.pdf    196 hpc_001.html
$ gpg -c hpc_001.html
$ ls -s
total 11256
11032 Flying_Beyond_the_Stall.pdf    196 hpc_001.html
   28 hpc_001.html.gpg

During encryption, I had to enter my passphrase twice. You must remember this passphrase, because without it you cannot decrypt the file. Please remember this: The data cannot be recovered without expending a massive amount of CPU time to crack the encryption. This is no joke – cracking the file could potentially take years (many years). Therefore, do not forget the passphrase, but also don't write it down and leave it somewhere.

You can also compress the text file before you encrypt (Listing 2). Notice that the compressed file hpc_001.html.gz is encrypted this time. GPG typically has the option of compressing the file as well as encrypting it, but I like to keep these two steps separate.

Listing 2: Compress and Encrypt a File

$ gzip -9 hpc_001.html
$ ls -s
total 11084
11032 Flying_Beyond_the_Stall.pdf   28 hpc_001.html.gpg
   24 hpc_001.html.gz
$ gpg -c hpc_001.html.gz
$ ls -s
total 11108
11032 Flying_Beyond_the_Stall.pdf   28 hpc_001.html.gpg
   24 hpc_001.html.gz               24 hpc_001.html.gz.gpg

To decrypt the encrypted file to another file, you just use the -d -o options. The -o directs the output to a file, and the -d tells GPG to decrypt the file. In the example in Listing 3, I decrypt the compressed file hpc_001.html.gz.

Listing 3: Decrypt a Compressed File

 gpg -o hpc_001.html.gz -d hpc_001.html.gz.gpg
gpg: 3DES encrypted data
gpg: encrypted with 1 passphrase
gpg: WARNING: message was not integrity protected
$ ls -s
total 11108
11032 Flying_Beyond_the_Stall.pdf   28 hpc_001.html.gpg
   24 hpc_001.html.gz               24 hpc_001.html.gz.gpg

During the decryption, I had to give the passphrase that I used to encrypt the file. Notice that the decrypted file is called hpc_001.hml.gz – I erased the original hpc_001.html.gz before I decrypted the file. You can check that the file is correct by uncompressing it and then looking at the first few lines, which should be text (Listing 4). It looks like plain text to me and it matches the original file.

Listing 4: Uncompressed File

$ gunzip hpc_001.html.gz
$ ls -s
total 11280
11032 Flying_Beyond_the_Stall.pdf   28 hpc_001.html.gpg
  196 hpc_001.html                  24 hpc_001.html.gz.gpg
$ head -n 5 hpc_001.html
HPC Storage -- Getting Started with IO profiling applications

You can also choose a cipher other than CAST5. In Listing 5, the AES-256 cipher is used to encrypt the PDF file in the directory. Again, I had to enter my passphrase twice to encrypt the file.

Listing 5: Using the AES-256 Cipher

$ ls -s
total 11228
11032 Flying_Beyond_the_Stall.pdf    196 hpc_001.html
$ gpg -c -crypto-algo=AES256 Flying_Beyond_the_Stall.pdf
gpg: WARNING: recipients (-r) given without using public key encryption
$ ls -s
total 20940
11032 Flying_Beyond_the_Stall.pdf      196 hpc_001.html
 9712 Flying_Beyond_the_Stall.pdf.gpg

GPG is very flexible and powerful. For example, you have options for handling keys so that you don't have to enter a passphrase (unattended key generation) [11], but keep in mind that these should be 64-bit and not the typical 32-bit keys.

ZIP

ZIP [12] is an archive file format, something along the lines of TAR. In addition to collecting files in a single archive file as tar does, zip can also compress the resulting archive or components of the archive. It supports several compression methods, including:

According to the Wikipedia link, the most popular compression method is Deflate.

In addition to creating an archive and compression, Zip is also capable of encrypting the archive. It can use AES methods, which are documented in the .zip file format specification. Also, starting in version 6.2 of the Zip format, file name encryption was introduced so that metadata was encrypted in what is called the Central Directory portion of Zip. However, in portions of the archive, the file names are not encrypted.

Using zip to encrypt files is very similar to using gpg, as shown in Listing 6. In the command line, the --password option specifies the passphrase as MY_SECRET. You also can use the -P option instead of --password. If you want to use a longer passphrase with blanks, enclose it in single quotes.

Listing 6: ZIP Encryption

$ ls -s
total 11228
11032 Flying_Beyond_the_Stall.pdf    196 hpc_001.html
$ zip --password MY_SECRET file.zip hpc_001.html
  adding: hpc_001.html (deflated 88%)
$ ls -s
total 11252
   24 file.zip  11032 Flying_Beyond_the_Stall.pdf
  196 hpc_001.html
$ zip --password 'Help me Watson' file.zip hpc_001.html
  adding: hpc_001.html (deflated 88%)
$ ls -s
total 11252
   24 file.zip  11032 Flying_Beyond_the_Stall.pdf    196 hpc_001.html

However, specifying the passphrase on the command line means that it will be in the "history" of the shell. This is probably not the most secure way to encrypt files with Zip. Perhaps a better way is just to use the --encrypt option (-e); then, it will prompt you for the passphrase, which you have to enter twice (Listing 7). The options used are -r, recursively Zip; -0, no compression (for faster execution); and -e, encrypt (prompts the user for a passphrase).

Listing 7: Secure ZIP Encryption

$ zip -r -0 -e files.zip ./
Enter password:
Verify password:
  adding: Flying_Beyond_the_Stall.pdf (stored 0%)
  adding: hpc_001.html (stored 0%)
$ ls -s
total 22456
11228 files.zip  11032 Flying_Beyond_the_Stall.pdf    196 hpc_001.html

The command takes all of the files in the current directory and sub-directories and creates a single archive without compression. However, if you compress the archive, Zip will post the list of files in the archive. Depending on your level of paranoia, you might not want this to happen. In that case, it might be better to use tar to create the archive and then compress and encrypt it with zip (i.e., zip -e).

7-Zip

7-Zip [13] is an open source tool for creating, compressing, and encrypting archives (much like Zip). It has several algorithms for data compression:

7-Zip also supports AES-256 for encryption and can encrypt file and directory names.

Using 7-Zip is pretty easy and is very similar to using Zip. In Listing 8, I encrypt the simple text file hpc_001.html. The options I used are a, create archive, and -p, set password. By just specifying -p, 7-Zip (i.e., p7zip, the package that provides the 7z command-line version of 7-Zip) will prompt for the passphrase so that it won't be copied into the shell history. However, you can input the passphrase on the command line.

Listing 8: 7-Zip Encryption

$ ls -s
total 7288
 196 hpc_001.html  7092 MFS2007.pdf
$ 7z a -p hpc_001.html.7z hpc_001.html
7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,1 CPU)
Scanning
Creating archive hpc_001.html.7z
Enter password (will not be echoed) :
Verify password (will not be echoed) :
Compressing  hpc_001.html
Everything is Ok
$ ls -s
total 7308
 196 hpc_001.html    20 hpc_001.html.7z  7092 MFS2007.pdf

A key point to note is that p7zip leaves the original file in place and creates a copy with a .7z extension. This might seem subtle, but it can be important. I like leaving the original file alone, because if the encryption process goes sideways, it's still available. I also like to decrypt the file and do a diff between the original file and the decrypted file. It might seem pointless to do this, but I like to make sure the encryption and decryption processes work correctly – and that I remember my passphrase.

To decrypt the file, you just use the -e (extract) option (Listing 9). As you can tell, 7-Zip outputs some detail about the decryption of the file. Also, don't forget that as part of the extraction, p7zip also uncompresses the file.

Listing 9: 7-Zip Decryption

$ 7z e hpc_001.html.7z
7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,1 CPU)
Processing archive: hpc_001.html.7z
Enter password (will not be echoed) :
Extracting  hpc_001.html
Everything is Ok
Size:       198510
Compressed: 18945

OpenSSL

SSL and its successor TLS were protocols developed to provide communication security over a network using cryptography. You are probably most familiar with the protocol in web browsers for websites beginning with https. You can take advantage of the encryption in SSL or TLS to encrypt your data as well.

The most common implementation of SSL is OpenSSL [14], an open source community project for a full-featured toolkit implementation of SSL and TLS, as well as general-purpose cryptography. It was the subject of the infamous Heartbleed [15] vulnerability that primarily affected the communication encryption aspect of OpenSSL. The cryptography library aspect of OpenSSL is still extremely useful.

OpenSSL has a number of ciphers, cryptographic hash functions, and public key encryption algorithms (Table 2). OpenSSL really focuses on encryption and decryption and not compression. Consequently, you shouldn't expect the encrypted file to be smaller than the original.

Tabelle 2: OpenSSL Encryption Options

Ciphers

AES

Blowfish

Camellia

SEED

CAST-128

DES

IDEA

RC2/4/5

Triple DES

GOST 28147-89

Cryptographic hash functions

MD5/4/2

SHA-1/-2

RIPEMD-160

MDC-2

GOST R 34.11-94

Public key cryptography

RSA

DSA

Diffie-Hellman key exchange

Elliptic curve

GOST R 34.10-2001

Using OpenSSL requires a few more arguments than the typical encryption tool, as you can see in the command line in Listing 10. The first option, aes-256-cbc tells OpenSSL to use the 256-bit key along with the AES cipher. The -in option specifies the input file, and -out specifies the output (encrypted) file.

Listing 10: OpenSSL Encryption

$ ls -s
total 7288
 196 hpc_001.html  7092 MFS2007.pdf
$ openssl aes-256-cbc -salt -in hpc_001.html -out hpc_001.html.enc
enter aes-256-cbc encryption password:
Verifying -- enter aes-256-cbc encryption password:
$ ls -s
total 7484
 196 hpc_001.html   196 hpc_001.html.enc  7092 MFS2007.pdf

The option -salt is added to the command line because it can improve security. Classically, a salt [16] is a random bit of data used as an additional input to a one-way function that hashes the passphrase. It protects against dictionary attacks and against precomputed rainbow table attacks [17], because without the salt, the same password always generates the same encryption key. When the salt is used with OpenSSL, the first 8 bytes of the encrypted data are reserved for the salt (i.e., the random bit of data). When the file is decrypted, the salt is read from the encrypted file and used for decryption.

Notice that OpenSSL does not echo the passphrase, so it can't be captured in the shell history. Also notice that OpenSSL doesn't have a standard file extension. I chose .enc to show that the file is encrypted.

As I mentioned earlier, OpenSSL is just an encryption tool. It doesn't do file compression. Consequently, the file size of the encrypted text file in the previous example is roughly the same as the original text file. OpenSSL can operate on a compressed file as well, but in a separate step, as follows:

$ openssl aes-256-cbc -salt -in hpc_001.html.gz \
      -out hpc_001.html.gz.enc
enter aes-256-cbc encryption password:
Verifying -- enter aes-256-cbc encryption password:

Using the -d option,

$ openssl aes-256-cbc -d -in hpc_001.html.enc -out hpc_001.html.2
enter aes-256-cbc decryption password:

decrypting a file is also fairly easy.

Crypt Replacements

The *nix of yore came with a command named crypt that could be used to encrypt data. However, its level of security wasn't very good, so it disappeared from the scene. Even if you can find the source for it, several tools break the encryption, so it should be avoided at all costs. However, the popularity of Crypt was already in place, and some older scripts used it. Today, you have the choice of several Crypt replacements.

Ccrypt

Ccrypt [18] is based on the Rijndael block cipher. The same cipher is the basis of the AES specification. Internally ccrypt takes the specified password, which can be of any length, and hashes the key to 256 bits. As with almost all ciphers, the longer the password, the better the security.

Ccrypt is not symmetric, which means you have to specify whether you are encrypting or decrypting a file. To encrypt a file, use the command ccrypt:

$ ls -s
total 7288
 196 hpc_001.html  7092 MFS2007.pdf
$ ccrypt hpc_001.html
Enter encryption key:
Enter encryption key: (repeat)
$ ls -s
total 7288
 196 hpc_001.html.cpt  7092 MFS2007.pdf

Notice that the encrypted file size is about the same size as the unencrypted file for this text example.

One thing you should pay particular attention to is that Ccrypt encrypts the file but does not leave the original file in place. I worry about this behavior, because if a problem crops up during the encryption process, the file could be corrupted. The final thing to notice is that Ccrypt does not echo the passphrase to stdout, so the shell history cannot capture it.

To decrypt a file, run the command:

$ ccrypt -d hpc_001.html.cpt
Enter decryption key:

The -d option just means "decrypt."

Bcrypt

Another encryption option is bcrypt [19]. Bcrypt uses the Blowfish [20] encryption algorithm with passphrases of between 8 and 56 characters. It also uses an internal 448-bit hashed key. The Blowfish algorithm itself seems to provide a good level of encryption if you don't use weak keys (use longer passwords), but the Bcrypt code itself hasn't been updated in a while. However, various versions of Bcrypt exist for many operating systems, including Linux, *nix, Windows, OS X, and others.

Encrypting a file using bcrypt is very simple:

$ bcrypt hpc_001.html
Encryption key:
Again:

Notice that the encryption process did not leave the original file in place, but it reduced the size of the encrypted file relative to the original file size.

You should also note that Bcrypt does not echo the passphrase to stdout, so the shell history will not capture it. Decrypting is also very similar:

$ bcrypt hpc_001.html.bfe
Encryption key:

Bcrypt is a symmetric cipher, that is, it can detect whether the file is encrypted and then decrypt it. Hence, it does not need a decrypt option.

MCrypt

Another replacement option for crypt is mcrypt [21]. It has a very large number of cryptography algorithms. A few of these include:

MCrypt has several modes of encryption that provide additional capability beyond just a straight block cipher. You can read about that at an online MCrypt man page [22]. Using mcrypt is very similar to the other Crypt replacement tools (Listing 11).

Listing 11: Using MCrypt

$ ls -s
total 7288
 196 hpc_001.html  7092 MFS2007.pdf
$ mcrypt hpc_001.html
Enter the passphrase (maximum of 512 characters)
Please use a combination of upper and lower case letters and numbers.
Enter passphrase:
Enter passphrase:
File hpc_001.html was encrypted.
$ ls -s
total 7484
 196 hpc_001.html   196 hpc_001.html.nc  7092 MFS2007.pdf

In contrast to Ccrypt and Bcrypt, MCrypt creates an encrypted file that is different from the original file; however, the encrypted text file is basically the same size as the unencrypted file. Also note that like the other Crypt tools, MCrypt does not echo the passphrase to stdout.

Summary

I didn't want this article to be just a survey of command-line tools for encrypting files, but I think it's important to compare the tools with one another so you can see the various features (or quirks) and how they might affect your workflow and your security. It's not a complete list of all tools available. For example, I did not cover the ability of Vim to encrypt files [23] while editing them, nor did I cover commercial tools – only open source. However, I hope I have covered the tools that illustrate the various capabilities.

As I said, I'm not a security expert, but I do take security seriously. In examining command-line tools that encrypt and decrypt, I have developed a few general principles to follow, including:

Notice in this list that I didn't mention anything about data compression. It's really your choice if you want to encrypt a file as well as compress it using tools such as Zip or p7zip – or to use compression tools before and separate from encrypting the file. I like to compress my files before encrypting them so I can save as much space as possible. I will also use tar as often as possible to collect the files into a single archive.

In my opinion, encryption can be a very important tool to protect your privacy. Think about making encryption a part of your everyday processes.