Main Content

Python Script - easy examples of lots of basics

Archive - Originally posted on "The Horse's Mouth" - 2007-12-08 13:19:50 - Graham Ellis

Here's a Python script which is pretty imperfect, but shows a whole lot of the basic facilities of the language in use - a sort of "you can do it this way" crib sheet for newcomers. Written during last week's course, with delegates making suggestions as I went along - so if you think the code looks like it was designed by a committee of eleven ....

The overall objective is to read in a web server access log file and report the elepase time in days and hours from the first to the last record .... I will add comments which we did NOT add on the day!


# Bring in regular expression and time handling code from the Python
# distribution which is NOT loaded as standard every time you run an
# individual Pyton program. Use import to keep each in their own
# namespace rather than from which would pollute the top level.
import re
import time
  
# Set up some standards - incoming file name, and the pattern to look
# for when hunting for a date and time stamp in a record
fname = "access_log.xyz"
linematch=re.compile(r'(\d{2})/(\w{3})/(\d{4}):(\d{2}):(\d{2}):')
months=["Jan","Feb","Mar","Apr","May","Jun",
  "Jul","Aug","Sep","Oct","Nov","Dec"]
  
# Read in the whole file and keep the SECOND and last lines only
# Line counts starts at zero, by the way - but in our data file the
# first line is a header record that we want to ignore.
# Enhancement note - do NOT use readlines if the source file is huge
info = open(fname,"r").readlines()
lines = [info[1],info[-1]]
print lines
  
# Define a function that extracts a timestamp using a
# rerular expression, and returns seconds from 1.1.70
def getsecs(matcher,stri):
  dt = matcher.findall(stri)
  elapsed = list(dt[0])
# index saves a loop to look for the number for the month!
# Months grabbed from outside so that different month names
# can be used if you're not working in English!
  elapsed[1] = months.index(elapsed[1])+1
  elt = time.mktime((int(elapsed[2]), int(elapsed[1]),
    int(elapsed[0]), int(elapsed[3]),
    int(elapsed[4]), 0,0,0,0));
  return elt
  
# Bit messy this bit - get elapsed time via a loop
# (I was trying to be too clever with the code!
tato = 0
for sample in lines:
  tat = getsecs(linematch,sample)
  print tat
  tat = tat - tato
  tato = tat
print tat
  
# And get the days and hours ...
el = int(tat)
days = el / (3600*24)
elx = el - days * 3600 * 24
hours = elx / 3600
print days,hours


Running that ...

grahamellis$ python daterange
['seaweed - - [15/Jul/1998:08:32:38 -0400]
"GET / HTTP/1.0" 200 1476\n',
'sealion - - [02/Feb/1999:11:54:03 +0000]
"GET /perlman/READMEs/README.threads HTTP/1.1" 200 10787\n']
900491520.0
917956440.0
17464920.0
202 3
grahamellis$