Does robots.txt actually work?

Main Content

Does robots.txt actually work?

Archive - Originally posted on "The Horse's Mouth" - 2009-02-16 21:26:46 - Graham Ellis
If you put an entry into your robots.txt file to ask the various robots to disallow (cease crawling) certain files and directories, do they actually take note of your request ... considering that it's a purely voluntary standard ...

Three or four days back, I excluded some old map pages which were being heavily crawled and I've just visited my log files for the last fortnight:

-bash-3.2$ egrep -c 'net/+map' ac_200902*

ac_20090201:8779

ac_20090202:7884

ac_20090203:15697

ac_20090204:9284

ac_20090205:4944

ac_20090206:9640

ac_20090207:10299

ac_20090208:7015

ac_20090209:5534

ac_20090210:4188

ac_20090211:6808

ac_20090212:853

ac_20090213:1669

ac_20090214:74

ac_20090215:76

Yes! - it has worked. Accesses to these pages - which were predominantly crawlers - has dropped from some 8,000 to 10,000 per day down to less than a hundred - and I suspect that most of those are genuine hits!

You'll find more about robots.txt here