Main Content

Searching for numbers

Archive - Originally posted on "The Horse's Mouth" - 2005-02-04 06:06:02 - Graham Ellis

If I'm searching for a 50kg bag of cement, and the online store only offers 48kg bags, will their search engine find this product and say "is this what you want"? Our own site searches do clever things with alphabetic searches but we're rarely had to do a "near number" hunt on our own behalf ... but we have for client sites.

Is 48 near to 50? Yes. Is 8 near to 10? Maybe, but not so near. Is 1 near to 3? No - almost certainly not. So you can't rely just on difference - indeed 93 is nearer to 100 that 1 is to 3 and the difference is much more.

Algorithm 1.

Let "$h" be the value you have and "$t" being the value you're testing. Then the nearness factor is defined as
abs( ($h + $t) / ($h - $t))
with the larger number being the closest. On this algorithm, an infinite result tells you that two values are numerically identical (so you had better extract that special case first), and higher numbers indicate better matches. Let's see some example factors:
48 and 50 - factor is 49
8 and 10 - factor is 9
1 and 3 - factor is 2
93 and 100 - factor is 27.57

Here's Perl code for searching (yes, we have Perl search training) to work this our:

#!/usr/bin/perl

if ($ARGV[0] == $ARGV[1]) {
print "Parameters are numerically identical\n";
} else {
printf ("factor is %.2f for %s and %s\n",
abs(($ARGV[0]+$ARGV[1])/($ARGV[0]-$ARGV[1])),
$ARGV[0], $ARGV[1]);
}


Algorithm 2

The algorithm above isn't always ideal. If you're searching for phone numbers, for example, it's not helpful. If you've transposed digits values, you'll want to score hits on values that are numerically very different. For this, you'll want to use someting like a Levenshtein distance algorithm. We talk further about this on our web site in the Solutions Centre