Loving programming in Python - and ready to teach YOU how
Archive - Originally posted on "The Horse's Mouth" - 2015-02-22 10:19:54 - Graham EllisI like programming in Java, but I love programming in Python. It's been a real pleasure to get back to Python this morning. I'm teaching a private course in Cambridge this week, and a public python course the following week. And a new example as work my hand back in ...
Scenario - I require to read records from a whole folder of files and run a combined analysis of them. I'm looking at huge files - our server logs which are between 40Mbytes and 65Mbytes per day, and analysing a month or more of them at the same time.
I've written a class called dirStream, into the constructor of which I pass the folder name for the files. And I then loop through the data being returned by the stream, which (optionally) can be filtering for only records that match a paricular pattern. The example here has called another method to get the file name and line number in that file where the record was found:
source = dirStream("logs")
for record in source.getRecord(lookfor):
file,line = source.getWhere()
print line,file,record
As this is my test harness, I've then exercised the other methods I've provided - firstly for a brief report:
report = source.getStatus()
for k in report.keys():
print "{0:<20s} {1}".format(k,report[k])
And then for a full report on the number of records and matches in each input file:
for file_info in source.getReport():
print "{1:8d} {2:8d} {0:s}".format(*file_info)
Let's see that in action, searching for "Salisbury" references for the last 3 weeks:
python dirStream.py Salisbury
stream_status completed
current_file_name
current_line_number -1
searching_for Salisbury
lines_read_so_far 4011213
lines_matched_so_far 942
total_number_files 21
searching yes
current_file_number 21
And the detailed output
156228 27 logs/ac_20150201
161144 55 logs/ac_20150202
190542 22 logs/ac_20150203
227646 44 logs/ac_20150204
221454 67 logs/ac_20150205
202896 45 logs/ac_20150206
198114 104 logs/ac_20150207
175836 56 logs/ac_20150208
170156 34 logs/ac_20150209
202743 62 logs/ac_20150210
190289 52 logs/ac_20150211
190397 56 logs/ac_20150212
207429 44 logs/ac_20150213
251313 31 logs/ac_20150214
165796 25 logs/ac_20150215
168314 13 logs/ac_20150216
194138 65 logs/ac_20150217
181487 15 logs/ac_20150218
187665 65 logs/ac_20150219
185631 31 logs/ac_20150220
181995 29 logs/ac_20150221
The complete example's source code is available to you, with some comments and wrapped so that you can make use of it too for this common "parse all the records in all the files in a directory" requirement.
Of note to delegates / learners - interesting Python things:
• Use of generator within a method
• A constuctor that does more than just store incoming values
• A state holder (this.status_mode)
• Optional parameters
• Use of a dict to return a whole series of named status values
• use of "and" and "or" as a lazy "if" and "else"
• passing in multiple values to a format method using "*" to expand a list
• exception handling to cheaply pick up lack of command line selectors
• use of os.path.join to add in the appropriate file / folder separator character for the current OS
• conditional use of from to load extra code only if running the test programs
• A method that returns multiple values (a tuple)
I think I said at the start - I love programming in Python