Tools URL Tricks with htaccess Lead image: © Dmitry Bruskov, 123RF.com
© Dmitry Bruskov, 123RF.com
 

More than password protection, htaccess has you covered

Accessory

In addition to introducing extra security measures, htaccess lets you add some fantastic features to your website without having superuser access to the main configuration files of the web server. By Chris Binnie

Hark back to those halcyon days when the GIF89a and the blink tag ruled the web page and the most common use of htaccess on the nascent Apache HTTP Server was password protecting a directory. As you can imagine, now that Apache's feature set has grown significantly richer, you have a host of useful ways to use htaccess.

More recently, htaccess has gained popularity as a conduit to mod_rewrite, where it has been used to create aesthetically pleasing (and as a result, search engine-friendly) URLs so that, for example, a URL such as domainname.com/directory-name/filename.html might become simply domainname.com/filename.

Such URLs usually are not only a little shorter to link to but easier to remember and, most importantly, are highly relevant to search engine indexing because, currently, significant weighting is put on the keyword content of URLs.

Another use that appealed to my sense of simplicity is the use of htaccess for redirecting short URLs to long ones. That might seem slightly counterintuitive, but before I look a little closer at that application of htaccess, I'll run through some other practical examples.

One 2011 study shows that the behemoth that is Apache is running more than 60% of websites on the Internet, comfortably surpassing the mark of 100 million websites some time ago, so I hope you'll find that these examples are applicable to a multitude of scenarios.

Myths and Folklore

A common misconception is that .htaccess files are needed for password protection, but apparently anything you can add to a local .htaccess file within a directory can also be added to the main Apache configuration file.

The purpose of local .htaccess files is to allow certain Apache configuration changes to be made by a non-privileged user who doesn't have access to the configuration file of a virtual host. Another myth is that the file absolutely must be called .htaccess, but that's not the case; with a quick change to the AccessFileName directive, you can specify different control file names to your heart's content.

qwerty123

Now I will take a quick look at password protection and its uses. As with all of the Apache config options, you have a plethora of parameters to choose from, but I'll only explore specific examples here. If you need to alter any of them slightly, the documentation on the Apache site is both considerable and comprehensive, and you can find an abundance of examples on the web as well.

Because of Apache's popularity, you can bet that someone, somewhere has solved almost exactly the same problem you're currently facing, so without too much hunting, you might just find the, sometimes infuriating, syntax you need without having to reinvent the wheel.

AllowOverride

Before you can alter specific configuration parameters within the directories of an Apache virtual host, you have to change the AllowOverride directive to All. Figure 1 shows examples of this change, which will require an Apache reload or restart afterward, unlike other local htaccess file alterations, which are live immediately. On that note, a word of caution: Be warned that even a missing dot, space, or caret in your htaccess file could deny access to your entire website. My recommendation is to test, test, and test again!

AllowOverride is set to "All" in this example.
Figure 1: AllowOverride is set to "All" in this example.

Old School

Password protecting a file, a group of files, or a directory with basic authentication takes two simple steps. You must create, first, a .htaccess file containing rules and, second, a file containing usernames and passwords using the htpasswd tool, a little piece of software that builds a password file, usually called .htpasswd. Both files start with dots, so they're hidden on Unix-like filesystems.

Among the many advanced options is the ability to set authentication over SSL and the use of digital certificates. Here, I'll look at plain HTTP, but with a set of more than one username and password so you can give each of your colleagues a different password. It's critical to remember that this also gives access to subdirectories, so create your directory structure carefully.

First, it's important to place the password file in a location away from your web root, so it would take a relatively serious server compromise involving local access to get permission to read the password file, rather than being easily found by surfing through web-accessible directory structures (Figure 2).

The .htaccess file is placed inside a directory you can password protect.
Figure 2: The .htaccess file is placed inside a directory you can password protect.

Inside the .htaccess file is the username and password file .htpasswd, located two levels above the web root (i.e., /var/www/localhost/htdocs in this case); the group file isn't in use, so it references a dead end on the filesystem for security reasons. AuthName is the text that pops up when a browser tries to access the directory and can be anything you like. Finally, I'll look at the require line, which has two useful formats.

Using the require variable, you can either specify each and every user who is to be allowed access, such as

require user ganymede europa callisto

which is practical for a handful of usernames, or, as above, you can check for a valid user in the password file with

require valid-user

for a greater number of users. If you connected to a database for authentication, you also could use valid-user with some additional effort. Generally, I prefer the stricter, and more deliberate, user method, which gives a slightly higher level of security. It requires the user to have a valid listing in both the user name and password files and not just one file. Without SSL and other optional bells and whistles, you're not locking up Fort Knox with this approach, though, and I would consider this level of security designed only to prevent access by the casual passerby.

The second step is to create the password file once you're inside the /var/www directory:

htpasswd -c .htpasswd ganymede

The -c switch creates the file .htpasswd if it doesn't already exist and adds the username ganymede (the largest of Jupiter's moons, even larger than the planet Mercury, if you're curious). To add another username or indeed edit an existing username like ganymede, simply use this syntax:

htpasswd .htpasswd ganymede

To write to the /var/www directory successfully, you might need to be root (i.e., use su). Of course, any valid and readable path will suffice – just make sure it's not available via Apache to the outside world.

Thanks to the useful htpasswd tool, your encoded .htpasswd file looks like this after you've added three users:

ganymede:YUOqsWaZWpl
europa:RfXOqsYaZWsa
callisto:PgRxO.tsvW.adk

The first part of each line is the username followed by a colon and then a hashed password to keep any local users from instantly deciphering a file with poorly set permissions.

Access by IP Address

My favorite way of making use of htaccess is with administrative directories, whose passwords I don't want to have to remember but to which I still strictly want to control access. This method only works well if you regularly visit under the same static IP addresses.

If you use this method hand-in-hand with the above password protection example, surely some clever soul has scripted a way of allowing access by IP address immediately, which then fails over to a password prompt if the IP address doesn't match at the first authentication check.

Of course, you can combine both methods easily enough and insist that access be from specified IP addresses and demand a password, but for simplicity, my .htaccess looks like the following:

Order Deny,Allow
Deny from All
Allow from 12.12.12.12
Allow from 23.23.23.23
Allow from 34.34.34.34

This example denies access to everybody except the IPs listed, which follows the very sensible "deny by default" firewalling rule of thumb, wherein you only allow access explicitly and shut out everyone else by default. A cautionary note again is that this also gives access to all of your subdirectories, too.

Pretty URLs

Without meaning to sound too enthusiastic, I think Apache's ability to rewrite horribly complex URLs to perfectly human-friendly ones has to be one if its finest features. Apache flexes its powerful muscles on many of today's busiest websites, and as anyone who uses the web a lot knows, URL rewriting truly does make a difference to being able to find, bookmark, and reference material on the web.

I used the domainname.com/directory-name/filename.html to domainname.com/filename with mod_rewrite example at the start of this article. The good news is that with relative ease, your website's internal naming structure can remain in a state of chaos (and your pages can be called anything you like) but still look pretty to the public. In other words, you could add this capability to an existing site, if you could change the internal hyperlinking on each page, without too much of a headache.

To enable htaccess within the main Apache configuration files, you would enable AllowOverride All in your vhost file; then, to rewrite URLs, your htaccess file might look something like this:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^support$ support.php
RewriteRule ^signup$ signup.php
RewriteRule ^([^/.]+)/([^/.]*)?$ view.php?title=$1&id=$2&%{QUERY_STRING}
</IfModule>

The first line checks to see whether the mod_rewrite module is enabled (a setting that's not always needed but can be applied to any htaccess file for modules you want to enable, providing the admin has allowed you to do so).

If the module is loaded, the rules that follow apply. The support.php and signup.php examples are relatively self explanatory; they become domainname.com/support and domainname.com/signup, respectively. The trickier line at the end shows how powerful pattern matching can be used in conjunction with mod_rewrite and regex (regular expressions) – for example, when you're passing variables across the URL to other PHP pages.

Re-Directing for Shortened URLs

At the beginning of the article I mentioned how pleased I was with the simplicity that htaccess offered when I wanted to, somewhat counterintuitively, change short URLs to longer ones. Most people have used the likes of TinyURL.com [1], Bitly [2], or Google URL shortener [3] to shrink unsightly and unmanageable URLs to something that's more suitable for email or Twitter.

Some of these services let you add a custom suffix to the end of the URL after the slash, so rather than bit.ly/R4c6feh, you could use bit.ly/dictionaryword. Some services also offer the ability to edit the destination URL at a later date, which can save the day when you make major changes to your websites.

When I wanted to demonstrate a new platform to colleagues, URL lengthening was exactly what I needed. I could pass on a really short URL over the telephone and then talk them through a short demo. The problem was that all of the shortest custom aliases, as they are called when you choose the suffix, had been taken already from the URL shortening services, and I would end up with tinyurl.com/reallyquitealongwordindeed or a tricky series of numbers and letters. These URLs were so long that they were verging on being the same length as the URL I was trying to shorten!

I then realized that some of these services allow you to use your own custom domain name, wherein you (ostensibly, at least) can choose your own custom alias after your domain name, giving you endless possibilities The bit.ly service reportedly generates a jaw-dropping 40-50 million shortened URLs every day, so by using your own domain name, the permutations are equally large.

I thought I'd found the ideal solution with one of the services I've already mentioned. Generally, however, all the services did almost everything but fell at the last hurdle for one reason or another. Then, I remembered my mighty friend htaccess.

By spending about US$ 20 a year to register a really short domain name on one of the briefest top-level domains – I'll call it http://wx.yz – and then pointing it at my web space, I was able to recreate all of the functions of the URL-shortening services with the omnipotent htaccess.

An additional function, which only a scarce few of the shortening services offer, relates to search engines and the HTTP 301 and 302 response status codes, which I can easily enable for my links. Apparently, if you respond to a search engine with an HTTP 301, meaning the file you're redirecting to has moved permanently, the long link (the destination) will receive any credit for external links that point to it, rather than the easy-to-remember shortened URL receiving the SEO kudos.

The simplest way to work around this issue is to proffer an HTTP 302, which tells the search engine that it should just consider the link to be a temporary, not a permanent, redirect. Nice.

Clearly, adding a new URL by hand over SSH every time is a particularly tedious way to shorten your links, but a simple PHP script with the correct file-writing permissions could manipulate your redirections effectively through a browser, just like the aforementioned public services. An almost painfully simple htaccess file looks like that shown in Figure 3, demonstrating the 301 and 302 entries.

Redirecting with HTTP 301 and 302.
Figure 3: Redirecting with HTTP 301 and 302.

I can set my virtual host to point at the directory in which this htaccess file resides; therefore, any hits to index.html – that is, by visiting http://wx.yz in a browser – will go to the first long URL.

The address http://wx.yz/123 goes to the second long URL, and the still beautifully short http://wx.yz/z goes to the third. Your destination parameter doesn't have to be a fully qualified URL; instead, it could be a file like file.html or a directory name.

Now I have somewhere in the region of 25 short URLs set up, and I can track hits in Apache's logs and edit the URLs any way I like if any changes are made. Some URLs use words like demo and others are extremely concise and just use single characters like x.

The Apache toolkit has a lot of useful tools that I haven't even mentioned, much less looked at in this article, but I hope the content of what I've covered here will encourage you to roll up your sleeves and explore some of Apache's features.