Don't Be a Validation Nazi

Aug 26, 2007 code regex suggest edit

In my last post, I wrote about how most email validation routines are too strict when compared against what is allowed by the RFC. Initially I dismissed this phenomena as the result of ignorance of the RFC or inability to understand it, as I had trouble understanding it myself.

However, I think there’s something more fundamental at work here when it comes to validating user data. It seems that many developers, myself included, choose to ignore Postel’s Law when it comes to field validation. Postel’s law states…

Be conservative in what you do; be liberal in what you accept from others.

Postel wrote that in an RFC that defined TCP, but it applies much more broadly. It’s natural that developers, used to the exacting nature of writing code for a compiler, where even the most minor of typos can bring a program screeching to a halt, have a tendency to apply such exactitude on their users.

Dare I say it, but developers can tend to be validation nazis.

User: (filling out form) user+nospam@example.com

Validation Nazi: Entering a plus sign is $2.00 extra.

User: But the RFC allows for a plus sign.

Soup Nazi: You want plus sign?

User: Yes please.

Validation Nazi: $3.00!

User: What?

Validation Nazi: No form submission for you!

This is a mistake. Users are not compilers so we need to cut them some slack.

A List Apart provides some great examples of mistakes in treating users like computers and ways to correct them in the article, Sensible Forms: A Form Usability Checklist. Here’s a snippet about dealing with phone numbers (emphasis mine).

Let the computer, not the user, handle information formatting

Few things confuse users as often as requiring that users provide information in a specific format. Format requirements for information like telephone number fields are particularly common. There are many ways these numbers can be represented:

* (800) 555-1212 * 800-555-1212 * 800.555.1212 * 800 555 1212

Ultimately, the format we likely need is the one that only contains numbers:

* 8005551212

There are three ways to handle this. The first method tells the user that a specific format of input is required and returns them to the form with an error message if they fail to heed this instruction.

The second method is to split the telephone number input into three fields. This method presents the user with two possible striking usability hurdles to overcome. First, the user might try to type the numbers in all at once and get stuck because they’ve just typed their entire telephone number into a box which only accepts three digits. The “ingenious” solution to this problem was to use JavaScript to automatically shift the focus to the next field when the digit limit is achieved. Has anyone else here made a typo in one of these types of forms and gone through the ridiculous process of trying to return focus to what Javascript sees as a completed field? Raise your hands; don’t be shy! Yes, I see all of you.

Be reasonable; are we so afraid of regular expressions that we can’t strip extraneous characters from a single input field? Let the users type their telephone numbers in whatever they please. We can use a little quick programming to filter out what we don’t need.

The recommendation they give fits with Postel’s law by being liberal in what they accept from the user. The computer is really really good at text processing and cleaning up such data, so why not leverage that fast computation, rather than throwing a minor annoyance at your users. No matter how small the annoyance, every little mental annoyance begins to add up. As Jacob Nielsen writes (emphasis his)…

Annoyances matter, because they compound. If the offending state-field drop-down were a site’s only usability violation, I’d happily award the site a gold star for great design. But sites invariably have a multitude of other annoyances, each of which delays users, causes small errors, or results in other unpleasant experiences.

A site that has many user-experience annoyances:

appears sloppy and unprofessional,

demands more user time to complete tasks than competing sites that are less annoying, and

feels somewhat jarring and unpleasant to use, because each annoyance disrupts the user’s flow.

Even if no single annoyance stops users in their tracks or makes them leave the site, the combined negative impact of the annoyances will make users feel less satisfied. Next time they have business to conduct, users are more likely to go to other sites that make them feel better.

However, in the case of the email validation, the problem is much worse. It violates the golden rule of field validation (I’m not sure if there is a golden rule already, but there is now)…

Never ever reject user input when it truly is valid.

In the comments of my last post, several users lamented the fact that they can’t use a clever GMail hack for their email address because most sites (mine included at the time, though I’ve since fixed it) reject the email.

With Gmail you can append a tag to your email address. So let’s say you have “name@gmail.com” you can give someone an email address of “name++sometag@gmail.com” and it will faithfully arrive in your inbox. The use of this for me is that I can track who’s selling my email address or at least who I gave my email to that is now abusing it.

For fun, I wrote a crazy regular expression to attempt to validate an email address correctly according to the RFC, but in the end, this was a case of Regex abuse, not Regex use. But as one commenter pointed out…

THE ONLY WAY TO VALIDATE AN EMAIL ADDRESS IS TO DELIVER A MESSAGE TO IT!

This served to drive home the point that attempting to strictly validate an email address on the client is pointless. The type of validation you do should really depend on the importance of that email address.

For example, when leaving a comment on my form, entering an email address is optional. It’s never displayed, but it allows me to contact you directly if I have a personal response and it also causes your Gravatar to be displayed if you have one. For something like this, I stick with a really really simple email address validation purely for the purpose of avoiding typos…

^.+?@.+$

However, for a site that requires registration (such as a banking site), having the correct email address to reach the user is critical. In that case it might make sense to have the user enter the email twice (to help avoid typos, though most users simply copy and paste so the efficacy of this is questionable) and then follow up with a verification email.

In the end, developers need to loosen up and let users be somewhat liberal about what they enter in a form. It takes more code to clean that up, but that code only needs to be written once, as compared to the many many users who have to suffer through stringent form requirements.

Found a typo or mistake in the post? suggest edit

Comments

20 responses

Dave Ward • August 26th, 2007
I think that's a great point. It's so easy to miss the forest for the trees, writing overly complex routines that shave the rough edges off the hurdles, when the hurdles themselves should have never been there.
For phone numbers, I find that a masked entry field works well. The one in the AJAX Toolkit is great.
When you allow too much freedom, you inevitably get a database full of garbage like "Same as home" and "555-1212 or 555-1234".
DotNetKicks.com • August 26th, 2007
You've been kicked (a good thing) - Trackback from DotNetKicks.com
Adam Vandenberg • August 26th, 2007
One of the worst offenses is not allowing spaces in credit card numbers. The heck, people! It was crappy UI back in 2000, it's even crappier UI now.
Kalpesh • August 26th, 2007
Wouldn't it make sense to have some kind of webservice, which will take input (email, phone) & return the correct interpretation of it?
Just like CAPTCHA webservice api (akismet)
This will make the experience consistent.
What do you say?
The Other Steve • August 26th, 2007
Very well said!
Haacked • August 26th, 2007
Testing
Josh Stodola • August 26th, 2007
How dare you flag my comment as spam!
Josh Stodola • August 26th, 2007
Hmmm, looks like you got it fixed. Go ahead and delete my previous comments. Great post, by the way.
"You come back - ONE year! NEXT!" -Soup Nazi
Kevin Dente • August 26th, 2007
I agree with everything except the "enter your email address twice" part, for exactly the reason you point out. It makes sense with password fields, since you can't see what you're typing, but with email addresses it only adds hassle and doesn't help verification.
Ryan Smith • August 26th, 2007
I have a lot of clients that want to make every field on the contact form required. I always argue against this (and loose) because I feel you should make it as easy as possible for the customer to get in touch with you. If they don't want to provide their phone number is that really a reason not to have them contact you at all?
Also, for US phone numbers, the validation logic is so simple to allow multiple formats, it's amazing this is almost always screwed up.
http://www.dynamicajax.com/...
Good post though. I think JavaScript really needs to come back to usability improvement rather than something you can do "Neat Things" with.
The Other Steve • August 27th, 2007
To add to Ryan Smith's comments. It's amazing how many times I've signed up for something as Fred Flintstone, just to look at a demo or something. I once made the mistake of putting in my real contact data to download a demo of a software tool. They called me at least once a week, and from that day forward if caller id shows a number from New York, I won't answer the phone because 9 times out of 10 it's that sales rep.
Crazyguy • August 27th, 2007
Nice article Phil!
I'm not a windows-programmer but I just love the fact that some of your articles, like this one, can be helpful in other languages as well. It's the reason I read your site. :)
FrankC • September 4th, 2007
Good point on the email address validation. However, it seems like most of the time I'm handed the requirement to include this validation on the desktop apps and web apps I write. If I can cover whatever the sponsoring user thinks a valid email should be, then they're happy. I prefer to leave it as a free-form entry if I can though.
MikeSchinkel • September 6th, 2007
I so completely agree, and second what Adam said about credit card numbers. Most websites I buy from require that credit card numbers be digits only with no punctuations separating the grouping of four or five digits. Those groupings reduce typing errors and increase the ability for humans to catch errors by visual scanning. As a former web ecommerce merchant, I know that every time a user submitted an order with an typing error there was a good chance we'd ultimately loose the order (for a variety of reasons.) So the irony is sites that disallow punctuation with credit cards are increasing the number of potential lost sales from incorrectly typed credit card numbers.
Sean Blackman • September 17th, 2010
Can you tell me of a way to prove to others that my email is safe for any type of computer that can receive it?Especially any email software that has to process it.How about the ability to prove that social networking entries are safe to click on?
Bevan Arps • January 26th, 2011
Postel’s Law aside, another reason to be liberal in what you accept is that not everyone does it the same way.
Case in point: Phone numbers.
Not everyone follows the US model of 123-123-1234.
For example, mobile phone numbers in New Zealand all have area code 02x, where x is the mobile provider: 025/027 for Telecom, 021 for Vodafone, and so on. After that prefix, some numbers are 6 digit, some are 7, and some are 8.
And, if I mangle my mobile number sufficiently to fit your US template, I'll do so by leaving out my country code, which renders the phone number useless!
I've run into similar problems with suburb names - people who "helpfully" correct "Kelson" into "Kelston", when one is in Wellington and the other in Auckland. Or who assume that "Merivale" must be in Christchurch, when there's a "Merivale" in Tauranga as well.
Bottom line: Who knows the phone number better? You, when you don't even know the user in real life? Or the user themselves, who might have been giving out that phone number as a contact for 30 years?
Exercise some trust that they know what they're talking about!
If they care about you being able to contact them, they'll take care to ensure the details are correct. If they don't care, no amount of machine validation will be sufficient.
John • February 3rd, 2011
What I hate most is sites whose credit card validation code rejects card numbers with spaces in them. For crying out loud, that's the way it's printed on the card! And how hard is it to strip spaces, anyway?
Bear • May 3rd, 2011
How far is is worth taking your validation is a separate issue to whether or not a address can be validated against the RFCs.
Personally I use email address validation to teach students the futility of attempting to check the accuracy of data by checking structure alone — but I've still had to work out an expression that can validate addresses against the RFCs for when I'm asked if it can actually be done.
On OUR website we've recently added an email confirmation field, annoying for some perhaps but it has drastically reduced the number of undeliverable emails that were by and large resulting from careless typing.
Mathias Raacke • December 18th, 2011
Phone number validation is a great example: Some websites were I had to enter my phone number limited the area code to 3 digits. Area codes in Germany may be longer; mine has 4 digits. I had to enter a part of the area code in the phone number field.
Nick • December 3rd, 2013
Being from NZ, I must concur with this. Plus you get all the weird and wonderful "standard" ways different people who expect a country code expect (allow) it to be entered, yet in my experience the ITU (??) standard for such things, which you would expect should be the standard -- +64 1 2345678 -- is so often rejected both because it includes spaces (though will often accept hyphens in their place) and because the plus-sign is not acceptable (although often one or more of the common, but different, regional standard international access prefixes, such as 0, 00, 011, will be accepted!). Go figure!

Oh, and the thing that guarantees most strongly that I will never complete registering with a site unless I absolutely, truly must? I have one of those Irish "Fitz..." surnames, where the correct capitalization is FitzWhatever (or perhaps even, at least historically, fitzWhatever, but most people have given up trying to do either because of what I am about to describe, though the Irish are pretty adamant about one or the other still). Does your site insist that it knows better and that uppercase letters can only appear in the first character? Or, perhaps your site is really enlightened and you know about McWhatever, MacWhatever and O'Whatever, so you have coded exceptions for them?

Web developers who "know" how to spell my name better than I will rot in hell, along with their websites...