Main Content

Denial of Service ''attack''

Archive - Originally posted on "The Horse's Mouth" - 2006-03-17 06:07:38 - Graham Ellis

We've had 45000 page requests in the last week from the University of Illinois - 17000 of which were within a period of a few hours yesterday. Not bad going? Had we been recommended to the whole University as a site worth seeing? Alas, no; all the requests were coming from a single host computer, which is quite remarkable seeing that we've not got anywhere like 17000 pages on our site, and it's also quite remarkable as those extra hits came after I had been in email contact with the University who had promised to desist.


uiuc is a quarter of our daily traffic
and that was just in a couple of hours


The university has apologised to me, and I owe an apology in turn to people who have had trouble with our site in the last few days - we've been working to counter this traffic, which was causing a denial of service to our regular users. It's been quite an interesting couple of days!

So - what happened?

An Overview of automated browsing

The web was designed for human browsers - visitors who pull up a page and then come back seconds or minutes later for another page. But the protocol used is a straightforward one, so it's very easy to write automata (robots and crawlers) that go methodically through a large number of pages at electronic speed.

Automata such as this come in various flavours::

* Search engines such as Google, Yahoo, MSN and others which the site owner encourages for his own purposed

* Other engines such as the turnitin "bot" where a commercial outfit reaps all the content of a site for their own (or their customer's) purposes - in this case to sell universities and anti-plagiarism service.

* Utilities which are intended to gather a series of pages for an individual so that the individual can pre-load a few pages over a slow line for more efficient browsing

* Automata written or run with the purpose of causing disruption or expense to the web site owner.

Good practise for automated browsers

Technically, it is very easy indeed to run an automata - one you wrote yourself, or one that's out there already. But that doesn't mean it's good practise to do so, or that you'll be welcomed if you do. Automata should:

* check a file called robots.txt in the domain's home directory from time to time to see if they're welcome, and respect what it says. Details

* declare themselves to be automata (and which one they are) via the user agent string that is usually sent in all requests. This should include a URL in case the site they're visiting wants to know what they're up to

* respect the bandwidth and resources of the sites they visit, and the needs of other users to those sites - in other words, not visit a lot of pages in quick succession, nor call up lots of pages in parallel.

Alas, malicious automata (they're in a tiny minority, thank goodness) don't respect these rules. And another minority - not quite so tiny - don't fully understand these rules and their effect on sites they visit. And so, these days, web site owners need to consider defences and safety nets.

If more that 100 pages are requested in 300 seconds (I think those are the figures; we change them from time to time) on our web site, from a single location, we start to get worried and our web server provides a delayed response - it sleeps the request for a few seconds to throttle back the visitor and to give others a chance to have a look in.

So what happened in this case?

That's worked well in the past, except it seems that this latest attack (which evidence tells me was research code written without sufficient knowledge or thought to the effect it would have, rather than being malicious) was made from a cluster of parallel processes ... so that if one was put into a delay, others simply jumped on as well and we had so many concurrent visitors to the site that the queues couldn't cope. Rather like an ambassador being sent along to discuss something in diplomatic terms, and when he's kept waiting for his turn, the troops being sent in behind. Sorry, University of Illinois, this host is banned. Bully-boy tactics such as these are not acceptable, especially from a centre of excellence in learning which should know better