Identifying the first and last records in a sequence
Archive - Originally posted on "The Horse's Mouth" - 2016-02-26 12:22:56 - Graham EllisWhen programming to analyse event or log files, you'll often find it fairly easy to identify the initial or opening record of a linked series, but harder to spot the closing one until your program has gone well past it. Take this data, for example:
16:15:47 +00 INF: ===================Move Location===================
16:15:47 +00 INF: Location: Set Initial Position: 6033998 encoder counts
16:15:47 +00 INF: Location: Current Position: 6817024
16:15:48 +00 INF: Location: Current Position: 6155537
16:15:48 +00 INF: Location: Current Position: 6033996
16:15:48 +00 INF: Location: Current Position: 6033993
16:16:01 +00 INF: ===================Move Location===================
16:16:01 +00 INF: Location: Set Initial Position: 6033998 encoder counts
16:16:01 +00 INF: Location: Current Position: 6033996
16:16:01 +00 INF: Location: Current Position: 6034021
16:16:08 +00 INF: ===================Move Location===================
16:16:09 +00 INF: Location: Set Initial Position: -5966001 encoder counts
16:16:09 +00 INF: Location: Current Position: -5949212
16:16:09 +00 INF: Location: Current Position: -5239635
You can spot the start point (liens with "Initial Position") straight away as you parser the file, but the end points (the last "Current Position" before a new "Initial Position") can only be confirmed as end points once you'v emoved beyond and found either a different record to confirm that the earlier sequence is done, or have reached the end of file. Programming-wise, this means making a note of each potentially final record and replacing it or finalaising a sequence at each subsequent read, with an additional "finalising" check after the file read has been completed. All of which sounds very complex, and if you don't get the code exactly right is prone to error!
There is an alternative - which is to keep note of each intermediate record as you read then in, overwriting them if they turn out not to be the final record. I have illustarted this using a movement object, in Python (version 3):
starter = re.compile('Initial Position: (-?\d+)')
increment = re.compile('Current Position: (-?\d+)')
moves = []
for record in open("steps.txt"):
start = starter.findall(record)
if start:
moves.append(movement(start))
else:
inc = increment.findall(record)
if inc:
moves[-1].setStepPoint(inc)
And once I've parsed all the data, I can simply print out my objects:
for move in moves:
print(move)
Here's the class defintion:
class movement:
def __init__(self,startpoint):
self.startpoint = int(startpoint[0])
def setStepPoint(self,inc):
self.nowAt = int(inc[0])
self.soFar = self.startpoint - self.nowAt
def __str__(self):
return "From {} to {} movement is {}".format(
self.startpoint, self.nowAt,self.soFar)
Complete source code [here]
Sample output:
WomanWithCat:feb16 grahamellis$ python3 mlog
From 6033998 to 6033993 movement is 5
From 6033998 to 6034021 movement is -23
From -5966001 to -5239635 movement is -726366
WomanWithCat:feb16 grahamellis$