Archive - Originally posted on "The Horse's Mouth" - 2007-12-08 13:19:50 - Graham Ellis
Here's a Python script which is pretty imperfect, but shows a whole lot of the basic facilities of the language in use - a sort of "you can do it this way" crib sheet for newcomers. Written during last week's course, with delegates making suggestions as I went along - so if you think the code looks like it was designed by a committee of eleven ....
The overall objective is to read in a web server access log file and report the elepase time in days and hours from the first to the last record .... I will add comments which we did NOT add on the day!
# Bring in regular expression and time handling code from the Python
# distribution which is NOT loaded as standard every time you run an
# individual Pyton program. Use import to keep each in their own
# namespace rather than from which would pollute the top level.
import re
import time
# Set up some standards - incoming file name, and the pattern to look
# for when hunting for a date and time stamp in a record
fname = "access_log.xyz"
linematch=re.compile(r'(\d{2})/(\w{3})/(\d{4}):(\d{2}):(\d{2}):')
months=["Jan","Feb","Mar","Apr","May","Jun",
"Jul","Aug","Sep","Oct","Nov","Dec"]
# Read in the whole file and keep the SECOND and last lines only
# Line counts starts at zero, by the way - but in our data file the
# first line is a header record that we want to ignore.
# Enhancement note - do NOT use readlines if the source file is huge
info = open(fname,"r").readlines()
lines = [info[1],info[-1]]
print lines
# Define a function that extracts a timestamp using a
# rerular expression, and returns seconds from 1.1.70
def getsecs(matcher,stri):
dt = matcher.findall(stri)
elapsed = list(dt[0])
# index saves a loop to look for the number for the month!
# Months grabbed from outside so that different month names
# can be used if you're not working in English!
elapsed[1] = months.index(elapsed[1])+1
elt = time.mktime((int(elapsed[2]), int(elapsed[1]),
int(elapsed[0]), int(elapsed[3]),
int(elapsed[4]), 0,0,0,0));
return elt
# Bit messy this bit - get elapsed time via a loop
# (I was trying to be too clever with the code!
tato = 0
for sample in lines:
tat = getsecs(linematch,sample)
print tat
tat = tat - tato
tato = tat
print tat
# And get the days and hours ...
el = int(tat)
days = el / (3600*24)
elx = el - days * 3600 * 24
hours = elx / 3600
print days,hours