Analysing Google arrivals by country of origin
Archive - Originally posted on "The Horse's Mouth" - 2009-12-10 17:46:07 - Graham EllisWhere do our Google visitors arrive from? If your log files record the referrer filed, you can find out ... and you can find what search brought them to you as well. Here's a recent log analysis showing where all you readers arrived from:
com 7461 in 2177 uk 1409 ca 619 de 536 fr 383 ph 284 nl 276 au 268 pl 233 it 232 se 222 es 183 br 176 ru 140 fi 127 be 115 ch 115 id 102 my 101 cz 100 sg 94 ro 93 tr 93 tw 91 pt 90 dk 83 hu 82 | cn 80 ie 80 il 80 pk 78 th 78 mx 76 hk 64 ua 60 at 59 no 59 za 53 kr 50 ee 45 vn 44 ar 43 gr 42 nz 37 sk 35 bg 34 co 33 jp 32 lt 30 si 30 lk 27 eg 26 bd 25 cl 24 hr 18 | ae 15 jo 15 lv 14 by 11 np 11 ke 10 mu 10 pe 9 ve 9 bo 8 mt 8 om 7 uy 7 sa 6 cr 5 ec 5 gt 5 ma 5 md 5 mn 5 ng 5 az 4 ba 4 bw 4 gh 4 is 4 jm 4 lu 4 | rw 4 bh 3 tt 3 cat 2 kw 2 kz 2 lb 2 pr 2 py 2 sv 2 ug 2 cu 1 do 1 et 1 fj 1 ge 1 gi 1 ly 1 mz 1 ni 1 ps 1 uz 1 vi 1 ws 1 zw 1 |
Some of those are very familiar countries, but other I had to look up ... and I wondered if "cat" was some sort of error. It wasn't - it's the top level domain for the Catalan community.
Code for "the above" ... good old Perl ...
while (<>) {
/Googlebot/ and next;
if (($cou,$what) = /\.google\.(\S+?)\/.*[\?&]q=(\S+?)[\?&"]/) {
$cou =~ s/\.$//;
$cou =~ s/\w+\.//;
# $what =~ tr/+/ /;
# $what =~ s/%(..)/pack("C",hex())/ge;
$cz{$cou}++;
print if ($cou eq "") ;
}
}
@cio = sort {$cz{$b} <=> $cz{$a} or $a cmp $b} (keys %cz);
for $c (@cio) {
print "$c $cz{$c}\n";
}
The $what variable (above) is another interesting story; I have it commented out as I've not analyzed it in this post, but it tells you the search terms used by visitors. There's an example of this extra data in use here.
As regular readers may have guessed, the example above was written during a Perl Course I'm giving this week.