Main Content

Sorting - naturally, or into a different order

Archive - Originally posted on "The Horse's Mouth" - 2010-08-14 07:51:04 - Graham Ellis

There's a natural sort order for many things - for numbers, it's ascending, for words it's a dictionary order, for names it's by surname. But sometimes you want to sort a list of things different, or there's no provided way within a programming language to apply the natural sort to your type of thing.

In PHP, I can sort a list (OK - it's known as an "array") in its natural order:
  $vals = array(4,3,5,6,4,3,2,1,2,5,10,22,11);
  sort($vals);
  print (implode(", ",$vals));

and that will give me the following result:
  1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 10, 11, 22
which is all well and good ... until I tell you that the numbers are the numbers of houses on a busy street where I need to make deliveries, and so I'll need to go up the odd numbers, cross the street once, then go back down the even numbers.

The "trick" is to tell the sort routine how to distinguish between two values applying your criteria - how to tell whether "x" comes before "y" where "x" might be 4 and "y" might be 10. You do this by changing the sort function call to a usort function call, and passing into the function - as a second parameter - the name of a function that you have written that takes two values as parameters, and returns a negative (first input is correctly first), positive (first input should be second when sorted) or zero (same value as far as sort is concerned). This function is the called internally - many times - by the usort routine.

Let's see an example - in this first case, I'm sorting the numbers by their remainder when I divide them by 5 - so it's multiples of 5 first, then numbers that give a remainder of 1, and so on ...
  function remfive($x, $y) {
    return ($x%5 - $y%5);
    }
  $vals = array(4,3,5,6,4,3,2,1,2,5,10,22,11);
  usort($vals,remfive);
  print (implode(", ",$vals));

and that will give me the following result:
  5, 10, 5, 1, 6, 11, 2, 22, 2, 3, 3, 4, 4

With the case of my "postman" sort, all I need is a few more lines in my sort function. Comparing two letters, I first check if they're on opposite sides of the road. If they are, then, I can return a -1 or +1 straight away without comparing the numberic values. Having established that two letters are on the same side of the road, I can then compare the numeric values and sort ascending (odd side) or descending (even side). So my sort routine looks like this:
  function postman($x, $y) {
    if (($x % 2) == 0 and ($y % 2) != 0) return 1;
    if (($x % 2) != 0 and ($y % 2) == 0) return -1;
    if (($x % 2) != 0) return ($x - $y);
    return ($y - $x);
    }

and that will give me the following result:
  1, 3, 3, 5, 5, 11, 22, 10, 6, 4, 4, 2, 2

There's a complete example of this from the learning to program in PHP course that I ran last week ... see source code [here]. The same principle applies in Python - indeed, you can see a very old forum post of mine [here], and an example from the Python course that sorts by the 7th field on a line [here].


If you're wanting to do a user sort in Perl - again - the principle is similar. However, you do NOT pass in the two values to be compared as parameters to your sub - they are passed in via global variables $a and $b instead. There are examples [here] and [here] ... with the default (natural order) control case in a further examples [here]. As there are always many ways in Perl, you can also use an "anonymous block" rather that a sub - see [here]. ((Yes - we do offer public Perl courses if you need to learn Perl!))

In Java you can (re)define the natural sort order yourself on you own class of objects - using the Comparable interface. In this case, you define a method called compareTo which runs on one object, taking another as a parameter, and returns a negative, zero or positive int result. Example [here]. There's an alternative using a comparator class - see [here] and [here] for the two classes needed in this case; you'll want to use this latter method if you need to sort the same data type in more than one different way.