Main Content

Spike solution, refactored and reusable, Python - Example

Archive - Originally posted on "The Horse's Mouth" - 2013-02-28 19:30:23 - Graham Ellis

The other day, we suffered a gap in service from one of our dedicated hosts and after the event I wanted to go through the server logs to find how long the gap was, and whether there was any unusual activity prior to the outage. Problem was - I was looking in a 60 Mbyte log file with ovr 200,000 records covering some 60 hours of server use.

A quick Python script highlighted the length of the gap and the line numbers in the file, which I could then look at through an editor.

  counter = 0
  oldsid = 0
  oldstamp = ""
  
  for line in open("access_log"):
    counter += 1
    stamp = (line.split(" "))[4]
    day,h,m,s = (stamp.split(":"))
    sid = int(h) * 3600 + int(m) * 60 + int(s)
    if oldsid + 300 < sid:
      print oldstamp,stamp,counter
    oldsid = sid
    oldstamp = stamp
  
  print counter


Gave me the results:

  wizard:feb13 graham$ python gap
   [24/Feb/2013:04:02:15 1
  [25/Feb/2013:00:00:00 [24/Feb/2013:23:59:58 54148
  [26/Feb/2013:19:31:58 [26/Feb/2013:20:09:34 199300
  221290
  wizard:feb13 graham$


Which told me the start time of the outage, the end time, and to look around line 199300 of the file. But the code is obfurscated, difficult to follow, and very much a hack solution that's not easily re-usable. Which is fine as a quick fix in a crisis. However, by switching to an object oriented approach, that same issue can be solved with neater and re-usable code - very much more followable.

Taking my original as a "spike solution", I "refactored" as follows:

  counter = 0
  oldsid = 0
  oldlatest = None
  
  for line in open("access_log"):
    counter += 1
  
    latest = access(line)
    sid = latest.getseconds()
  
    if oldsid + 300 < sid:
      if oldlatest:
        print oldlatest
        print latest,counter
    oldsid = sid
    oldlatest = latest
  
  print counter,"oo version 2"


Which ran as follows:

  wizard:feb13 graham$ python oogap2
  At [25/Feb/2013:00:00:00 to firstgreatwestern.info from 66.249.76.149
  At [24/Feb/2013:23:59:58 to firstgreatwestern.info from 180.76.5.99 54148
  At [26/Feb/2013:19:31:58 to firstgreatwestern.info from 65.55.213.220
  At [26/Feb/2013:20:09:34 to firstgreatwestern.info from 132.185.160.100 199300
  221290 oo version 2
  wizard:feb13 graham$


Not only is that set of results more informative and cleaner, but also the code is easier to follow and to re-use. Alas, that code's no longer the full story as I've defined a class of access objects:

  class access:
    def __init__(this,line):
      linebits = line.split(" ")
      this.stamp = logtime(linebits[4])
      this.sitevisited = linebits[0]
      this.remoteip = linebits[1]
    def getseconds(this):
      return this.stamp.getseconds()
    def __str__(this):
      return "At " + str(this.stamp) + " to " + this.sitevisited + \
        " from " + this.remoteip


Which in turn have made use of a class of logtime objects:

  class logtime:
    def __init__(this,timestring):
      day,h,m,s = (timestring.split(":"))
      this.seconds = int(h) * 3600 + int(m) * 60 + int(s)
      this.printable = timestring
    def getseconds(this):
      return this.seconds
    def __str__(this):
      return this.printable


The end result is probably a more sophisticated approach than I would normally have taken (whether I needed to have logtime as a separate object is a long discussion we could have) - but the whole example provided a neat, real-life xample for the Python Programming Course which I have been running this week.

Original source code [here]. First (intermediate) OO example using a single class [here]. Final example as shown above - [here].