Stable sorting - Tcl, Perl and others
Archive - Originally posted on "The Horse's Mouth" - 2007-09-06 17:16:10 - Graham EllisHave you come across a STABLE sort? A Stable sort is one in which all the incoming elements which evaluate to an equal value when tested for sorting purposes remain in the same order in the output as they were in the input. Perhaps I had better give you an example.
I have a log file and there's a date in every line, but the lines are NOT in date order. And I want to sort my lines so that every line for each day is grouped together, by ascending date order. However, within any particular day I want my output records to remain in the same order that they were in in the input file.
Input data:
20070903 John Smith etc
20070901 Bill Jones
20070902 Arthur Clark
20070903 George Hill
20070902 Arthur Dent
20070903 Petal Honour-Flower
Stable sort output required:
20070901 Bill Jones
20070902 Arthur Clark
20070902 Arthur Dent
20070903 John Smith etc
20070903 George Hill
20070903 Petal Honour-Flower
As compared to output from a standard (default) sort:
20070901 Bill Jones
20070902 Arthur Clark
20070902 Arthur Dent
20070903 George Hill
20070903 John Smith etc
20070903 Petal Honour-Flower
Sorting in Tcl
Here is the sample code to demonstrate those various sorts in Tcl:
proc first {this that} {
set one [lindex $this 0]
set two [lindex $that 0]
return [string compare $one $two]
}
lappend mystuff {20070903 John Smith etc}
lappend mystuff {20070901 Bill Jones}
lappend mystuff {20070902 Arthur Clark}
lappend mystuff {20070903 George Hill}
lappend mystuff {20070902 Arthur Dent}
lappend mystuff {20070903 Petal Honour-Flower}
puts "Report in initial order"
puts [join $mystuff \n]
puts "-------------------------"
puts "Sorts by Forename within date"
set newstuff [lsort $mystuff]
puts [join $newstuff \n]
puts "-------------------------"
puts "Sorts ONLY by date."
# Appears to keep identically ranked records in ORIGINAL order
set newstuff [lsort -command first $mystuff]
puts [join $newstuff \n]
The lsort command sorts a list, returning a new list, and lsort -command does the same thing but allows the user to specify a proc which returns a negative, zero or positive value to indicate which of the records comes first (negative = no swap, positive = swap, zero = don't care). You'll note that it appears that Tcl provides a naturally stable sort in these circumstances, with and zero returns causing the output records to remain in their original order. I have seen no proof of this - the comment is empirical, but I have swapped by data lines around, and added and taken a few away, and it appears to hold.
Sorting in Perl
Here is the sample code to demonstrate those various sorts in Perl:
@mystuff = ("20070903 John Smith etc",
"20070901 Bill Jones",
"20070902 Arthur Clark",
"20070903 George Hill",
"20070902 Arthur Dent",
"20070903 Petal Honour-Flower");
print "Initial order\n";
print (join("\n",@mystuff),"\n");
print "-------------------------\n";
print "Sort by Forename within date\n";
@newstuff = sort(@mystuff);
print (join("\n",@newstuff),"\n");
print "-------------------------\n";
print "Sort by date ONLY\n";
@newstuff = sort bydate (@mystuff);
print (join("\n",@newstuff),"\n");
print "-------------------------\n";
sub bydate {
@a = split(/\s+/,$a);
@b = split(/\s+/,$b);
return $a[0] <=> $b[0];
}
The principles are very similar to those of Tcl, but the langauge is very different. Again, this appears to be a stable sort automatically.