Main Content

Web site traffic - real users, or just noise?

Archive - Originally posted on "The Horse's Mouth" - 2009-12-26 16:03:36 - Graham Ellis

It's been said that on some web sites these days, the majority of traffic isn't users at a regular browser at all; instead, it's robots that are indexing the page (such as Google, Yahoo, MSN, Yandex and others), and malware that's looking for holes through which to inject content on to other people's pages, or to copy and spread itself through sites which have left some security gatyes open. Now - we welcome the indexing crawlers, and we take steps to ensure that malware is ineffective, but when it comes down to it we really DO want a significant proportion of our traffic to be real people visiting our site! But how can we tell?

We collect a daily access log file; these days, it can be up to 45 Mbytes of log information per day, and that's far too much information to read through line by line. In any case, judgments on some of the lines would be "that is probably a genuine user" or "that looks rather fishy", which are hardly certainties on which to base a judgment. However, this graph, showing the size of the log file on a day by day basis gives us a very good clue. As I write (December, 2009), there's a 7 day cycle, with the log files on a busy day reaching the 45 Mbytes mark, and on a quiet day being around 25 Mbytes. This pattern has been long since established - indeed, I commenton on in in June 2007.

Looking at the difference - 45 Mb to 25 Mb - persuaded me that at least 20 Mb of our weekday traffic was "actual people" browsing, and in fact I decided that was a very pessimistic estimate. More and more, visitors to our site are using the technologies I write about for leisure activities, so will be arriving on our site at the weekend rather than during the week, and the noticeable dip on Fridays is, I'm sure, partly caused by the fact that Friday is a Holy day in many countries, from where people will return on Saturday and Sunday (Friday is also P.O.E.T.S. day (see Acronyms)). But just how much of that 25 Mb is actual people?


There's a clue here in this current graph, dated 26th December 2009. [This one won't change, but the one at the top of the page will continue to update daily!] On Christmas day, the log file size dropped to just over 15 Mbytes; the server was functioning correctly (so there's no reason for a blip there), but it *was* Christmas day. So I can now be more optimistic yet about the number of "actual people" browsing - suggesting that there's up to 30 Mbytes of traffic from such users on a busy day, with only a third of the traffic being robotic / malware.

Looking further still, there was still *some* genuine traffic in that 15 Mbytes on Christmas day. I took a look at our most popular search engine arrival page, and found that some 197 people had been referred to us (as against a peak of around 980), and that one particular image called up by regular users was referenced 1600 times rather that 4800 times two weeks previously. To that tells that even in the 15 Mb, we had around a quarter of our regular real traffic - in round terms, between 7 Mb and 8 Mb of log file. Which - very roughly - tells me that the automata that are running 24 x 7 account for only 6 Mb to 9 Mb of our normal traffic.

So - that's an estimate of just 16% to 20% of weekday traffic, and 24% to 36% of our weekend traffic, being the 24 x 7 background noise, with the substantial majority being the traffic at which we target the web site. I'm happy with these stats, having seen figures of up to 80% "noise" being quoted. We MIGHT have exceeded the 50% figure on just one day - Christmas day - but that's been more than worthwhile; it was our "almost off" day, and it give me valuable data against which to analyse our site records.


RSS feeds and Ajax form only a very small part of the traffic from our web site, and I have discounted them from consideration above. But if you use similar techniques / logic to me, you need to think carefully and understand your base data before coming to conclusions.