Main Content

Loving programming in Python - and ready to teach YOU how

Archive - Originally posted on "The Horse's Mouth" - 2015-02-22 10:19:54 - Graham Ellis

I like programming in Java, but I love programming in Python. It's been a real pleasure to get back to Python this morning. I'm teaching a private course in Cambridge this week, and a public python course the following week. And a new example as work my hand back in ...

Scenario - I require to read records from a whole folder of files and run a combined analysis of them. I'm looking at huge files - our server logs which are between 40Mbytes and 65Mbytes per day, and analysing a month or more of them at the same time.

I've written a class called dirStream, into the constructor of which I pass the folder name for the files. And I then loop through the data being returned by the stream, which (optionally) can be filtering for only records that match a paricular pattern. The example here has called another method to get the file name and line number in that file where the record was found:

  source = dirStream("logs")
  for record in source.getRecord(lookfor):
    file,line = source.getWhere()
    print line,file,record


As this is my test harness, I've then exercised the other methods I've provided - firstly for a brief report:

  report = source.getStatus()
  for k in report.keys():
    print "{0:<20s} {1}".format(k,report[k])



And then for a full report on the number of records and matches in each input file:

  for file_info in source.getReport():
    print "{1:8d} {2:8d} {0:s}".format(*file_info)


Let's see that in action, searching for "Salisbury" references for the last 3 weeks:

  python dirStream.py Salisbury
  
  stream_status        completed
  current_file_name    
  current_line_number  -1
  searching_for        Salisbury
  lines_read_so_far    4011213
  lines_matched_so_far 942
  total_number_files   21
  searching            yes
  current_file_number  21
  
  And the detailed output
  
  156228       27 logs/ac_20150201
  161144       55 logs/ac_20150202
  190542       22 logs/ac_20150203
  227646       44 logs/ac_20150204
  221454       67 logs/ac_20150205
  202896       45 logs/ac_20150206
  198114      104 logs/ac_20150207
  175836       56 logs/ac_20150208
  170156       34 logs/ac_20150209
  202743       62 logs/ac_20150210
  190289       52 logs/ac_20150211
  190397       56 logs/ac_20150212
  207429       44 logs/ac_20150213
  251313       31 logs/ac_20150214
  165796       25 logs/ac_20150215
  168314       13 logs/ac_20150216
  194138       65 logs/ac_20150217
  181487       15 logs/ac_20150218
  187665       65 logs/ac_20150219
  185631       31 logs/ac_20150220
  181995       29 logs/ac_20150221


The complete example's source code is available to you, with some comments and wrapped so that you can make use of it too for this common "parse all the records in all the files in a directory" requirement.

Of note to delegates / learners - interesting Python things:

• Use of generator within a method
• A constuctor that does more than just store incoming values
• A state holder (this.status_mode)
• Optional parameters
• Use of a dict to return a whole series of named status values
• use of "and" and "or" as a lazy "if" and "else"
• passing in multiple values to a format method using "*" to expand a list
• exception handling to cheaply pick up lack of command line selectors
• use of os.path.join to add in the appropriate file / folder separator character for the current OS
• conditional use of from to load extra code only if running the test programs
• A method that returns multiple values (a tuple)

I think I said at the start - I love programming in Python