Main Content

A case of case

Archive - Originally posted on "The Horse's Mouth" - 2004-11-14 08:25:31 - Graham Ellis

Do you have people enter their (postal) addresses on a form on your web site?
Does everyone lay out their address as it should be on the postage label?

If you've answered "yes" to the first question, I'll bet you've answered a resounding "no" to the second. It depends on just what you're selling online (and so on how net-aware your users are), but chances are that many people will supply their addresses all in one line, and that many will be all-capitals or all-lower-case. We've found that (in general) these problems apply to 1 in 5 submissions.

But what to do?

Catch 22.
If you always re-capitalise the first letter of each word only, you upset the daVinci-s and MacDonalds.
If you leave alone, your labels will look unprofessional and some bright spark will tell you that you're rude to use a lower case letter at the start of his name.

The solution that we've come up with over the years is as follows:

Check the address for case (in PHP, ereg("[A-Z].*[a-z]" .... will check for upper followed by lower)
a) if the user has entered it using both upper case and lower case characters, then he/she probably knows what he/she's doing and it should be left alone.
b) if the user has entered it in a single case, force it all to lower case the capitalise the first letter of each word - if you're using PHP, functions strtolower and ucwords will do this for you.

THEN check the address for new line characters
c) If you have at least one new line character between "printables" you can once again assume the user knows what they're doing
d) If you don't have embedded new lines, try replacing "comma-space" sequences with a new line, but not is that leaves just a number on a line on its own!

No solution id 100% user-proof but you should find the above deals with 95% of the rogue entries!