Does robots.txt actually work?
Archive - Originally posted on "The Horse's Mouth" - 2009-02-16 21:26:46 - Graham EllisIf you put an entry into your robots.txt file to ask the various robots to disallow (cease crawling) certain files and directories, do they actually take note of your request ... considering that it's a purely voluntary standard ...
Three or four days back, I excluded some old map pages which were being heavily crawled and I've just visited my log files for the last fortnight:
-bash-3.2$ egrep -c 'net/+map' ac_200902*
ac_20090201:8779
ac_20090202:7884
ac_20090203:15697
ac_20090204:9284
ac_20090205:4944
ac_20090206:9640
ac_20090207:10299
ac_20090208:7015
ac_20090209:5534
ac_20090210:4188
ac_20090211:6808
ac_20090212:853
ac_20090213:1669
ac_20090214:74
ac_20090215:76
Yes! - it has worked. Accesses to these pages - which were predominantly crawlers - has dropped from some 8,000 to 10,000 per day down to less than a hundred - and I suspect that most of those are genuine hits!
You'll find more about robots.txt here