Main Content

The hunt for unique words

Archive - Originally posted on "The Horse's Mouth" - 2005-01-16 07:44:15 - Graham Ellis

I understand that there are around 200000 different forenames in use in this country, of which a half (100000) apply to only one individual. OK - it's not a workday today, so am I purveying unusual facts as part of my relaxation? No - I'm getting me and you thinking.

Our web site is huge. There's over 5000 pages (and possible pages if you look at the information such as this that's fronted by a program) and the possibility for spelling errors is huge. For sure, there are certain common mistakes I make, but then there are other errors that occur once off and can remain for years until someone points them out. So does it really matter? Yes, it does; for every person who reports an error in a web page, there will be 10 others who will have noticed the error but not bothered to say anything and the error will reflect badly on the content provider and site owner in the eyes of these 10.

We've got the opportunity at the moment to proof read / spell check a great deal of material that's been generated in the last couple of years; with posts such as "the horse's mouth" some level of repetition and error is expected and accepted, but elsewhere that's not the case. How to proof read elsewhere efficiently?

Scheme:

a) We're going to re-use our search engine databases to find words that occur only once and are similar to other words on the site - an excellent way of highlighting potential problems. These words won't ALL be typos, but there's probably going to be quite a good hit rate

b) When we do find a typo, (through a route other than the uniqueness scheme of point (a) ), we're going to search our site and see if it occurs elsewhere so that we can fix repeating errors.