Which (virtual) host was visited? Tuning Apache log files, and Python analysis
Archive - Originally posted on "The Horse's Mouth" - 2015-01-23 06:56:40 - Graham EllisWe host a number of domains on our main server, and in order to avoid fragmentation of log files, we keep a sinle composite log. Rather than use a standard logfile format henceforth, I've changed the second field to carry the virtual host name accessed for the request, as that was missing up until this morning.
So in my server config
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
has become
LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
Gone - Remote logname (from identd, if supplied).
Added - The canonical ServerName of the server serving the request.
I've written a program (in Python) to take a look at the log file - see [here] - and run from the command line that gives:
-bash-4.1$ /home/wellho/trainee/y202/pytop
13358 - www.wellho.net
2719 - www.firstgreatwestern.info
223 - www.melkshamchamber.org.uk
100 - melksh.am
62 - www.twcrp.org.uk
39 - www.savethetrain.org.uk
39 - www.across-the-pond.co.uk
33 - www.wellhousemanor.co.uk
30 - twhc.org.uk
16 - transwilts.org.uk
5 - thebutlerdidit.info
1 - railcustomer.info
-bash-4.1$
The program's also got a web wrapper - if called up on the web, it uses a different formatter:
output = '{0:6d} - {1:s}'
try:
web = sys.argv[1] == "-w"
if web: output = '<tr><td>{0:d}</td><td><a href="http://{1:s}" target="avh">{1:s}</a></td></tr>'
except:
pass
and later in my code:
print output.format(counter[site],site)
And you can see the current results [here].
P.S. There's another quick demo web analysis program (showing its age) [here].