Main Content

URLs - a service and not a hurdle

Archive - Originally posted on "The Horse's Mouth" - 2004-11-04 07:40:54 - Graham Ellis

"A URL should be a service that provides a visitor with access to the information that he/she wants, and not a hurdle that he/she must pass in order to access that information". I'm paraphrasing there, but it's what Rasmus Lerdorf was saying in his "do you PHP?" lecture last month.

Ours is a "sales and marketing" site, and I want visitors and crawlers to be encouraged to find the information they require, and index it appropriately - so the concept described above is particularly attractive to us, and I posted up here a couple of weeks back about how to implement this in practice.

We've implemented the scheme outlined on our main domain, and our "Error 404" traffic has been halved; it was already low, accounting for perhaps 1 access in 200, but it has now been reduced further - just 10 requests in 18000 yesterday (that's 1 access in 1800).

I'm voting it to be a success, but not blindly so. Failing requests are checked to see if they're calling for a known file in the wrong directory, or if they're calling for a URL that does exist except they have the capitalisation wrong. Such cases are diverted, and send out a "200 OK" header. Requests for html pages that consist of alphabetic names that don't otherwise exist result in a search of "The Horse's mouth" and an appropriate results page if there is indeed a "hit", again with an OK status. And users who try to call up Microsoft's FrontPage technology to update our site (!!) get a polite page asking them to let us know of any corrections that they feel are needed.

If none of these situations allows us to return a good page, we still return a "404", but as you see from my stats above that's now a tiny minority of cases.

Where did all the errors that we previously had come from? A lot of them came from poorly written spiders, or from accesses to pages that had briefly appeared on our site or brief incorrect links that the search engines had found (we also have a table of about 10 specific URLs with specific diverts for such situations). And now we *can* be helpful.

Is our scheme going to give us long term issues with search engines? The concern has been expressed that search engines can now index the extra URLs since they're getting code 200s back. Well - that's fine by us. It's a resource through which the page can be found and a route that lets the user mine the information they need. It's more likely to drive the traffic to our site, which provides us with much of our sales and marketing, up rather than down.