Archive - Originally posted on "The Horse's Mouth" - 2006-05-24 05:55:05 - Graham Ellis
Here's an extract (reprinted with permission) from an email I received from James - a fellow web site owner, trying to identify visitors to his web site ... he had found and used our IP lookup, and was impressed that we got it right ....
I have just bought a csv database from [[supplier deleted]]. But their database says an ip is ntl when yours says its Virgin. His ip is a virgin ip, as its my friends.
"I think I have been robbed. Now I am looking to find the correct or accurate database. I am using a program written in php to read the users ip and then cross reference the true IP number (long) with a mysql database.
Some interesting points raised ... my answer ..
James, none of these databases / techniques is going to be 100% accurate. I have customers in Surrey who pop up as being in Ireland, and customers in Bristol claimed for France. In Cambridge last week, my hotel internet connection defaulted to Google in German. Expect between 98% and 99.5% accuracy, and provide a page that lets your user change his country.
To identify systems by country, we use the Maxmind database. There's an open source varient there, updated monthly, but it's now quite hard to find on their site. We use that for our user tracking page and, yes, that page is in PHP.
To identify the visiting host more closely, I would start with a reverse DNS lookup - the Apache web server can be set up to do this itself and it means that you don't need to hold a detailed local database at all. This comes up with a name for around 80% of host computers worldwide, from which you can deduce a lot more information, even though the strings are a bit inconsistent and tricky to work with at times. Simply change HostnameLookups Off
to HostnameLookups On
in the httpd.conf file and your log lines will contain strings such as client-82-3-85-219.manc.adsl.virgin.net and ata01cs603.americas.hp.net
instead of 82.3.85.219 and 212.118.35.6
The other technique I would look at, and I think this is our page that you may have found, uses a whois database lookup. Once again, we're not holding a local database but going out to the authoritative version on the web. Be careful of automating this one too often though as it's intended for human (non-robotic) use and you could get yourself banned if you pile in thousands of requests. If you're using a Linux / Unix server, call up the man page on whois to get you started.
Where you SHOULD use a database locally is in cacheing your results. There's little chance that an IP address will move except rarely, so if you hold on to your results for - say - a week, then you only have to check back with the reverse DNS / whois / maxmind very occasionally - great for efficiency, and also avoids you irritating them to the extent that your traffic gets questioned.