Main Content

mod_rewrite for newcomers

Archive - Originally posted on "The Horse's Mouth" - 2008-12-20 11:15:47 - Graham Ellis

What is mod_rewrite?

It's an Apache httpd (web server) module that takes user's requests for pages and diverts them to a resource of a different name (and perhaps type). Why might we want to do this? See previous article for some reasons and alternatives

Here's a simple example of a rewrite rule:

RewriteRule ^train_running.html$ running.html

And that tells the web server to divert requests to train_running.html to running.html instead. We've used rewrite rules as simple as this one to divert links that people have kindly made to our site - but got wrong - to the correct resource:

RewriteRule ^5704.html$ J704.html

But there's much more to mod_rewrite than that as you'll see if you look at the official manual. Actually, there's so much more that I'll give you some further examples - each taken from our web site.

When reading these examples, please bear in mind that you can place them in your httpd.conf main Apache Configuration file if you wish them to propagate through the whole of your site, or in an individual .htaccess file in the directories to which you wish them to apply. Your server administrator has the ability to enable / disable .htaccess, and what may and may not be placed in it

One final introductory note - the incoming URL is specified in the form or a regular expression (a pattern). At its most basic, most characters match one for one to the URL but there are a number of "specials" such as anchors ^ and $ that mean "starting with" and "ending with", character groups such as . which means "any character" and counts such as * which means "0 or more of the preceding item". Round brackets have multiple meanings within regular expressions; you'll see them in my examples below, used to capture parts of the incoming URL to substitute it into the rewritten one. We offer a day's course on Regular Expressions.

Collapsing a whole directory of web pages to a single script

RewriteRule ^(.*)\.html$ /share/index.php?pagename=

All requests that end in .html in the area that the .htaccess or httpd.conf file controls are to be passed on to a single PHP script in the /share/ directory called index.php. The name of the page that was passed in to this PHP script is to be placed in a request parameter called pagename.

This is the mechanism we use in our "wiki" ... where (after validation to avoid injection attacks), the following SQL query is run:

mysql_query("select * from sharedata where pagename = '$_REQUEST[pagename]'");

and the fields from the database are used to populate a single template.

Taking all .htm and .html files and passing them on, including GET data filled in to forms

^(.*)\.htm 8.php?pagename=index&sharename=&%{QUERY_STRING}

This is an extended example of passing all .htm (and .html) URLs in a directory on to a single script, with various parameters being passed in too, and any data that was submitted via the GET method appended on to the end. Note that by leaving off the $ on the pattern for the incoming URL, you're able to overcome any ".htm or .html" issues. We do the same thing for .php / .php3 / .php4 / .php5 ...

Diverting the home page of a directory

RewriteRule ^$ 8.php?pagename=index&sharename=index&%{QUERY_STRING}

The Regular expression here is "start with and ends right away", so that's a request for a directory rather than anything in a directory.

Referring image requests on to a database via a program

RewriteRule ^(.*)\.jpg /pix/feeder.php?image=&%{QUERY_STRING}

All .jpg requests are passed on to a PHP script, with the image name passed in as a parameter. Why do we do this?

• it saves a directory with a large number of pictures getting cluttered
• it allows us to monitor where images are loaded from (the referer) so highlighting any images hotlinked from other web sites
• it allows us to generate dynamic images (for example, this diagram of current train cancellations on First Great Western ;-) )
• it allows us to feed out low res or high res alternatives
• it allows us t maintain image data with the image
• and it allows us to use selected URLs to generate a random image.


Handling a special case - something NOT to rewrite

RewriteRule ^rooms.html rooms.html?%{QUERY_STRING} [L]

Where you have a whole directory / pattern being rewritten and you want to make exceptions, you can do so. In this example, there really is a file called rooms.html that you want to serve! The [L] modifier means "Last" and instructs the httpd web server that it should skip the following rules if this rule has been applied - very useful indeed in preventing some utterly confusing situations!

Rewriting that calls up a page from another server

RewriteRule ^info.php http://www.wellho.info/ [L,P]

This example - with a complete URL in the target position and a [P] modifier, proxies (see mod_proxy) the request on to a different URL on a different computer. Your web server actually becomes a client as it retrieves the page from the other system, and in turn it passes it back to your client. So the following two links will return the same .html source:

http://www.wellho.net/demo/info.php
http://www.wellho.info/

Indeed they do return the same source, but you'll need to be very careful indeed of any site-relative links, references to style sheets and images, etc - but mod_proxy is a story for another day!

Other directives in mod_rewrite

As well as RewriteRule directives, you'll also need to make use of a RewriteEngine directive to turn the facility on. The RewriteCond directive allows you to apply a condition to the following RewriteRule, and the RewriteLog allows you to produce a log file of rewrite requests. You can use a RewriteMap directive to call in a separate file of mappings, and that can even include random proxy forwards; what sounds (at first) like a crazy idea actually provides a neat way of spreading your processor load on heavy web based applications around a number of servers - there's a complete example here of forwarding Java requests from Apache httpd to multiple instances of Tomcat.

We cover the configuration of Apache httpd on our Linux Web Server course ... and there's a lot more that you can do other than just mod_rewrite!