Reading Google Analytics results, based on the relative populations of countries
Archive - Originally posted on "The Horse's Mouth" - 2012-03-24 10:04:03 - Graham EllisWe get a lot of traffic on our web site, but where does it come from? Our raw log files tell us a great deal, but there is just so much data there it's very hard to manage, so we're using Google Analytics as well. I'm delighted to read reports of xxx visitors from Sweden, yyy from Romania and zzz from Argentina. But these are countries with very different populations; I would be very interested to know how the figures stack up as a proportion of the population. In other words, take an average city with a million people and ask "if it's in xxxx country, how many of its people have we reached?"
By taking three files - table of top level domain names (may not be needed, but useful for tabulating the results), a table of country name to population mappings, and our own data from Analystics, I answered my question. It's a Python program - source code [here] - in which I read a table of populations (from WikiPedia) and a list of top level domains. You'll find the URLs of both of these extra files in the source if you want to try it on your Google Analytics. And I've then simply cut and pasted Google's visitors by country table into a file called "gad" to run the program on my personal data.
From a Python demonstration program viewpoint, there are excellent illustrations of Regular Expression use in Python, the use of a static method, and the changing of the natureal sort order on a list - the example is now available to me to illustrate these points during our Python Courses
From Google Analysics, the top ten countries in terms on NUMBERS of visitors are United States, India, United Kingdom, Germany, France, Canada, Philippines, Italy, Russia and Brazil. But looking at that based on the population of each country, we see a very different story:
wizard:anaproj graham$ python visitors | head -10
. is: 81.4 12291 [Iceland]
* fi: 75.8 13185 [Finland]
** hk: 75.0 13327 [Hong Kong]
* si: 71.0 14090 [Slovenia]
** se: 68.9 14505 [Sweden]
*** uk: 66.5 15037 [United Kingdom]
** ch: 56.8 17606 [Switzerland]
. mc: 55.7 17940 [Monaco]
. li: 55.3 18078 [Liechtenstein]
** il: 53.3 18747 [Israel]
wizard:anaproj graham$
What is the story? In the UK, in the time period of my data we have reached one person in 15,000 - that's 66 people in an average city of 1 million. We've only achieved deeper penetration than this in some smaller countries (the first file is a rough indication of size to allow a quick visual selection of significant countries). Good news for us - we're a UK based company and I would be very concerned if the UK wasn't high up the list. It also gives me an idea of where else there could be an interest, and it's not a big surprise that includes smaller nations with an excellent comprehension of English in the technical / professional population. Filtering out the three countries with the largest number of visits from my list above, we get a very different story when we look at population penetration:
*** uk: 66.5 15037 [United Kingdom]
*** us: 27.3 36605 [United States]
*** in: 3.9 256125 [India]
In other words, although we have more visitors from India, it's much more widely spread - not 66 people from a city of 1 million, but just 4 people.
All of this is valuable feedback - it turns out to confirm what I had suspected / surmised anyway, so it hasn't resulted in any "thunderbolt moments"; rather, it's re-assuring.
