I Knew How To Validate An Email Address Until I Read The RFC

Raise your hand if you know how to validate an email address. For those of you with your hand in the air, put it down quickly before someone sees you. It’s an odd site to see someone sitting alone at the keyboard raising his or her hand. I was speaking metaphorically.

at-sign Before yesterday I would have raised my hand (metaphorically) as well. I needed to validate an email address on the server. Something I’ve done a hundred thousand times (seriously, I counted) using a handy dandy regular expression in my personal library.

This time, for some reason, I decided to take a look at my underlying assumptions. I had never actually read (or even skimmed) the RFC for an email address. I simply based my implementation on my preconceived assumptions about what makes a valid email address. You know what they say about assuming.

What I found out was surprising. Nearly 100% of regular expressions on the web purporting to validate an email address are too strict.

It turns out that the local part of an email address, the part before the @ sign, allows a lot more characters than you’d expect. According to section 2.3.10 of RFC 2821 which defines SMTP, the part before the @ sign is called the local part (the part after being the host domain) and it is only intended to be interpreted by the receiving host...

Consequently, and due to a long history of problems when intermediate hosts have attempted to optimize transport by modifying them, the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

Section section 3.4.1 of RFC 2822 goes into more detail about the specification of an email address (emphasis mine).

An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain.  The locally interpreted string is either a quoted-string or a dot-atom.

A dot-atom is a dot delimited series of atoms. An atom is defined in section 3.2.4 as a series of alphanumeric characters and may include the following characters (all the ones you need to swear in a comic strip)...

! $ & * - = ^ ` | ~ # % ' + / ? _ { }

Not only that, but it’s also valid (though not recommended and very uncommon) to have quoted local parts which allow pretty much any character. Quoting can be done via the backslash character (what is commonly known as escaping) or via surrounding the local part in double quotes.

RFC 3696, Application Techniques for Checking and Transformation of Names, was written by the author of the SMTP protocol (RFC 2821) as a human readable guide to SMTP. In section 3, he gives some examples of valid email addresses.

These are all valid email addresses!

  • "Abc\@def"@example.com
  • "Fred Bloggs"@example.com
  • "Joe\\Blow"@example.com
  • "Abc@def"@example.com
  • customer/department=shipping@example.com
  • $A12345@example.com
  • !def!xyz%abc@example.com
  • _somename@example.com

Note: Gotta love the author for using my favorite example person, Joe Blow.

Quick, run these through your favorite email validation method. Do they all pass?

For fun, I decided to try and write a regular expression (yes, I know I now have two problems. Thanks.) that would validate all of these. Here’s what I came up with. (The part in bold is the local part. I am not worrying about checking my assumptions for the domain part for now.)

^(?!\.)("([^"\r\\]|\\["\r\\])*"|([-a-z0-9!#$%&'*+/=?^_`{|}~] |(?@[a-z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$

Note that this expression assumes case insensitivity options are turned on (RegexOptions.IgnoreCase for .NET). Yeah, that’s a pretty ugly expression.

I wrote a unit test to demonstrate all the cases this test covers. Each row below is an email address and whether it should be valid or not.

[RowTest]
[Row(@"NotAnEmail", false)]
[Row(@"@NotAnEmail", false)]
[Row(@"""test\\blah""@example.com", true)]
[Row(@"""test\blah""@example.com", false)]
[Row("\"test\\\rblah\"@example.com", true)]
[Row("\"test\rblah\"@example.com", false)]
[Row(@"""test\""blah""@example.com", true)]
[Row(@"""test""blah""@example.com", false)]
[Row(@"customer/department@example.com", true)]
[Row(@"$A12345@example.com", true)]
[Row(@"!def!xyz%abc@example.com", true)]
[Row(@"_Yosemite.Sam@example.com", true)]
[Row(@"~@example.com", true)]
[Row(@".wooly@example.com", false)]
[Row(@"wo..oly@example.com", false)]
[Row(@"pootietang.@example.com", false)]
[Row(@".@example.com", false)]
[Row(@"""Austin@Powers""@example.com", true)]
[Row(@"Ima.Fool@example.com", true)]
[Row(@"""Ima.Fool""@example.com", true)]
[Row(@"""Ima Fool""@example.com", true)]
[Row(@"Ima Fool@example.com", false)]
public void EmailTests(string email, bool expected)
{
  string pattern = @"^(?!\.)(""([^""\r\\]|\\[""\r\\])*""|" 
    + @"([-a-z0-9!#$%&'*+/=?^_`{|}~]|(?<!\.)\.)*)(?<!\.)" 
    + @"@[a-z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$";

  Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
  Assert.AreEqual(expected, regex.IsMatch(email)
    , "Problem with '" + email + "'. Expected "  
    + expected + " but was not that.");
}

Before you call me a completely anal nitpicky numnut (you might be right, but wait anyways), I don’t think this level of detail in email validation is absolutely necessary. Most email providers have stricter rules than are required for email addresses. For example, Yahoo requires that an email start with a letter. There seems to be a standard stricter set of rules most email providers follow, but as far as I can tell it is undocumented.

I think I’ll sign up for an email address like phil.h\@\@ck@haacked.com and start bitching at sites that require emails but don’t let me create an account with this new email address. Ooooooh I’m such a troublemaker.

The lesson here is that it is healthy to challenge your preconceptions and assumptions once in a while and to never let me near an RFC.

UPDATES: Corrected some mistakes I made in reading the RFC. See! Even after reading the RFC I still don’t know what the hell I’m doing! Just goes to show that programmers can’t read. I updated the post to point to RFC 822 as well. The original RFC.

Technorati tags: , ,

What others have said

Requesting Gravatar... BCS Aug 21, 2007 10:38 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I once saw a ~3 PAGE regex that was supposed to correctly validate email addresses.
Requesting Gravatar... David Leadbeater Aug 21, 2007 10:39 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Actually you've interpreted the RFC slightly wrong ;)

The RFC says \ for escaping is only valid inside a qcontent, so in order to enter characters that need escaping (i.e. other than dot-atom) they need to be inside double quotes (last paragraph of section 3.2.2).

e.g.: Abc\@def@example.com becomes "Abc@def"@example.com

I don't know about .net but if you look at the source code to the Perl module Email::Valid, you'll see a huge regex near the end. This actually validates according to the RFC.
(Code at
http://search.cpan.org/src/RJBS/Email-Valid-0.179/lib/Email/Valid.pm).
Requesting Gravatar... Marcelo Calbucci Aug 21, 2007 10:52 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Hahaha... I went through the same issue about a year ago.

I knew there was more to an email address than I was assuming, because back in the day I had an email address that looked more like a regular expression, something like:
marcelo.calbucci%mandic@fapesp.com.br

So, when I was writing my email validator in C# (regular expressions are too slow in some cases) I checked out the RFC, which is way more complex than anyone could ever imagine (except the people that wrote it).

After a while I decided to limit the scope and not make it perfect, but good enough.

The problem with making it match the spec is that somebody might mistype a symbol instead of a character and your regular expression will find it acceptable. I didn't want that to happen. If somebody has a weird symbol on their email address we'll simply reject it.
Requesting Gravatar... Jeff Schoolcraft Aug 21, 2007 10:56 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
It seems regular expressions come up a lot when someone mentions validating email addresses. Curiously, and just as incorrect in my opinion, regex are also mentioned in the same sentence as parsing HTML. Notice the parsing bit, but that's not the topic of this post.

I've presented on the topic of regular expressions a number of times at usergroups and I always put up a slide showing this regex. It's the regex used by the Perl module Mail::RFC822::Address and it's nasty.

My problem with validating email addresses is even though it conforms to the spec does not mean it's an active, valid address and worse may not even belong to the user.

So it seems we'd want to:
1. Catch simple mistakes to make a better user experience.
2. Have a valid email address that can be used.
3. Make sure the email belongs to the user.

So how do we do that?
1. We could use a simple regex that's not too restrictive to make sure it generally looks like an email address (something @ something probably with a .) We could also make the user type their email address twice, verifying it the same way we would make them verify their password. A quick check on equality either means they didn't make a mistake, they consistently make that mistake in which case this hasn't helped us, or they copied and pasted it.

2. We could use a really nasty regex. We could shoot off an external process that tries to verify the email address through the mail server of its domain. We could send some url with a hash and have the user confirm their email. An ongoing part of this solution might also be to cull through bounce mail from the server and invalidate the address.

3. I really don't know how else to do this besides mailing the user something and requiring them to do something based on the contents of the email. This is the option from #2 above with the email containing a url and some hash. Do this every time you get a new email address and you should be fairly confident that you can send email address to the user.

My thought on this whole thing is: Why collect the data if you're not going to use it; and why just guess at it's validity when you could confirm it through user action?
Requesting Gravatar... Nathan Aug 21, 2007 11:07 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Firefox's linkification plugin only identified "Abc@def"@example.com as a correct e-mail...all the others went unlinked.
Requesting Gravatar... Haacked Aug 21, 2007 11:17 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
@David Hmm... I'm not so sure. The very first sentence of the last paragraph of RFC 2822 3.2.2 states...

Note: The "\" character may appear in a message where it is not part of a quoted-pair.

</block>

Not to mention the examples by the author of the SMTP RFC in RFC 3696 RFC Page 6 shows that the \ escaping happens outside of a double quoted string.

Note that he says...

When quoting is needed, the backslash character is used to quote the following character.



And goes on to say...

In addition to quoting using the backslash character, conventional double-quote characters may be used to surround strings.


This implies there are two different ways to quote characters. I think the term quote here is confusing and probably should be escape.

In any case, this only bolsters my point that reading a spec is difficult and all specs are ambiguous to the readers. ;)
Requesting Gravatar... Haacked Aug 21, 2007 11:24 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
@Jeff I agree with your points, but would point out that these are not mutually exclusive points of validation. I think in general there are several levels of validation.

I don't want to try and send you a verification link (as is common) if I already know your email is a fake. Hence when collecting emails, I would do a fairly liberal email validation (for example, make sure there is an @ character). As you point out, this can be beneficial to the user experience to help prevent typos.

What would be really bad for user experience is too strict validation where you reject a perfectly valid email that really does belong to the user.


Requesting Gravatar... David Leadbeater Aug 21, 2007 11:30 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I definitely agree they are hard to interpret. I've fixed bugs related to them on a fairly large email system.

I'm not sure the specs are ambiguous in this case, though. The next sentence of section 3.2.2 says:


A "\" character that does not appear in a
quoted-pair is not semantically invisible. The only places in this
standard where quoted-pair currently appears are ccontent, qcontent,
dcontent, no-fold-quote, and no-fold-literal.


"not semantically invisible" meaning that it is not interpreted with any special meaning and is displayed as is.

Also the reason I picked up on it is the original email RFC categorically states that it must be within double quotes. If anything I'd say it's a mistake in RFC 3696 (which is only an informative RFC, so is likely to have had less scrutiny than a standards track one).
Requesting Gravatar... David Leadbeater Aug 21, 2007 11:33 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I just spotted this is corrected in the errata for RFC 3696.
Requesting Gravatar... Tim Aug 21, 2007 11:40 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I ran into something similar with domain name parsing... I never knew this, but non 'english' symbols are completely valid such as:

http://www.hÔtels.com/

is a different site than:

http://www.hotels.com/

(Hopefully my comment here displays the first URL correctly, but if it doesn't the letter o in the first URL has a circumflex (e.g a ^ symbol) above it.

Searching for this in google requires you to put it in quotes else it will return words with the english 'o' in them.
Requesting Gravatar... Haacked Aug 21, 2007 12:08 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
@David Nice catch, but damn you for creating more work for me! ;) I'll try and update my post after lunch. :)
Requesting Gravatar... Damien Guard Aug 21, 2007 1:21 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I've been using a regex that basically checks for at least one non-whitespace char before the @ sign, another one after, a dot and then another two.

Looks like you haven't touched on the international UTF-8 symbols and accented characters that are recommended to be valid (a later RFC perhaps - I think I saw it in the IETF DRUMS 08 on SMTP)

[)amien
Requesting Gravatar... Haacked Aug 21, 2007 1:32 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Ok, I finally corrected the post to fix my misinterpretation.

Regarding international characters, I know that IRI (International Resourec Identifier) can be mapped to URI by hex encoding using the %HH sequence of bytes. I'm not well versed in this though. And yes, my regex does not take that into consideration at all.

Check out the perl regex linked to by other commenters. I wonder if that does. ;)
Requesting Gravatar... Rik Hemsley Aug 21, 2007 1:35 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Why try to validate? If you're being nice to the user, ask them to enter their address twice. If they get it wrong both times, tough luck. They'll find out when they don't get a message.

If you really want to validate, you could be up tight about it and use a proper parser, but most software does it wrong, so you'll be in the minority (along with me).
Requesting Gravatar... Gabe Krabbe Aug 21, 2007 1:54 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Of course, this is assuming that you're not trying to parse an actual e-mail. See RFC 2822, section A.5, and think "arbitrary nesting" - it is not technically possible to use a regex and be fully compliant with the standard when trying to figure out the address, but if you want to use it for SMTP (where the comments aren't allowed), you may have to give it a go.

Yes, this has been a long-time favourite of mine to point out that regular expressions are sometimes not the best choice.
Requesting Gravatar... Haacked Aug 21, 2007 1:55 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
@Rik I'm not sure making a user enter that twice is exactly being nice. I would find it annoying. Especially if you also make them enter their password twice.

I think some simple validation is helpful to the user as long as the validation is liberal.

As I mentioned in an earlier comment, I think validation on the client should be more liberal than the RFC. It's fine to have false negatives (emails that are invalid, but you let through) but it's not fine to have false positives (emails that are valid, but you flag as wrong).
Requesting Gravatar... Rik Hemsley Aug 21, 2007 2:20 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Phil, I think users would prefer to have to enter their email address twice (they can copy+paste - at least it makes them look at what they typed) if it means they avoid typos and therefore aren't sitting wondering why they're not getting messages - or, worse, that their messages are going to someone else (which is highly possible if they're on a popular webmail service).
Requesting Gravatar... Haacked Aug 21, 2007 2:24 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
@Rik I can copy and paste with my eyes closed. It's an automatic action, like driving to work. Sometimes I drive to work when I mean to drive somewhere else. In other words, I don't think it helps really.

Even so, even with requiring a user enter their email twice, why wouldn't you do some simple validation. At the very least make sure there is an @ sign.
Requesting Gravatar... Lamby Aug 21, 2007 3:01 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Has anyone considered accepting pretty much anything that's not malicious and then looking up whether MX records for that domain exist?

It actually catches typos, it's not dependent on anyone's interpretation of a dodgy specification, and it's fairly future-proof too.
Requesting Gravatar... David Aug 21, 2007 6:08 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Yes, please get the word out. Do you know how many sites refuse my email address because it has a + sign in it? It's very frustrating...

The best compromise I've seen so far is to have it complain if you use a "weird" character, but then offer you a chance to say "no, that *really* is my email address". That way it catches stupid mistakes, but lets you have the final say as to whether it's valid or not.

-David
Requesting Gravatar... Member Blogs Aug 21, 2007 6:09 PM
# Links (8/21/2007)
.NET (Phil Haack) knew How To Validate An Email Address Until (he) Read The RFC PowerShell: Using PowerShell
Requesting Gravatar... Prashant Rane Aug 21, 2007 7:39 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
The O'Reilly "Mastering Regular Expressions" book has a one page regular expression that matches email address. Coming from the book of regular expressions; I assume it would be quite correct. It is towards the end of the book. Sorry, I don't have it with me right now otherwise I would have given you the page number. Enjoy.
Requesting Gravatar... David Stone Aug 21, 2007 11:35 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
You'll find this page interesting: http://www.regular-expressions.info/email.html
Requesting Gravatar... Rik Hemsley Aug 22, 2007 1:49 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Phil, local addresses don't need an @ sign.
Requesting Gravatar... Samus_ Aug 22, 2007 2:35 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
hey! you didn't validate the domains! perhaps between the two of us can we reach a complete solution :P

http://forums.worsethanfailure.com/forums/post/110636.aspx
Requesting Gravatar... Jay Kimble Aug 22, 2007 8:03 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Thanks for getting the word out Phil...

Something that isn't common knowledge, but GMAIL has a neat little Email address hack that I like to try to use sometimes...

With Gmail you can append a tag to your email address. So let's say you have "name@gmail.com" you can give someone an email address of "name++sometag@gmail.com" and it will faithfully arrive in your inbox. The use of this for me is that I can track who's selling my email address or at least who I gave my email to that is now abusing it.

BTW, this site's email address validator (for comments) is wrong. I just tried adding "++haacked" to my email address... you should fix that <grin />
Requesting Gravatar... Travis Illig Aug 22, 2007 9:16 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
The comment re: the O'Reilly book is key - Mastering Regular Expressions by Jeffrey Friedl uses matching an email address as an example in chapter 7 ("Perl Regular Expressions") of how to craft complex regular expressions. The Perl module that was mentioned, Mail::RFC822::Address includes the expression that is arrived at.

Email addresses aren't the only places you'll find out there containing ridiculous simplifications that lead to incorrect validations.

From a larger perspective, it also gets sort of interesting when you encounter someone from a non math/CS background who is a self-taught programmer that you try to explain regular languages to (so they understand why their regexes aren't working right). Sometimes I feel like regular expressions are sort of a "lost art."
Requesting Gravatar... bofe Aug 22, 2007 9:36 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
PHP: http://www.iamcal.com/publish/articles/php/parsing_email/
Requesting Gravatar... Bill Weiss Aug 22, 2007 9:54 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
First: Your blog software fails at validating my email address. The only strange character in it is a plus (+).

That leads me to my first real point. You said "I think I’ll sign up for an email address like phil.h\@\@ck@haacked.com and start bitching at sites that require emails but don’t let me create an account with this new email address. Ooooooh I’m such a troublemaker." Guess what? You'll get plenty of bitching in if you just use plus addresses. Your SMTP server probably copes with them out of the box. Some asshole's validation software probably doesn't.

I've been on that path for a few years. Out of hundreds of complaints, guess what I've gotten? A large number of no response, a large (but smaller) number of indignant "that doesn't work! Fix your address!", some "wow, that's strange. Oh well, our software doesn't cope.", and a single digit number of "hey! I'll fix that to make my software correct!". It's not a fun passtime.

Second, here's the regex I use to validate email addresses: .+@[^@]+\.[^@]+
Looks simplistic? It works ok. It's not actually correct, as it doesn't take local addresses correctly (me@foo is valid, assuming there's a machine named foo around). That's ok, it's a strange edge case.

The real test, of course, is handing the address off to my SMTP server and seeing if you get the email. If you do, congratulations, it's a real address. If not, guess it wasn't. You're verifying those addresses first, right?
Requesting Gravatar... Haacked Aug 22, 2007 10:16 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
@Bill Thanks for pointing that out. I need to make sure that fix gets into the next version of Subtext.

I changed my email validation to be very liberal. .*?@.*
Requesting Gravatar... Aaron Jensen Aug 22, 2007 10:40 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I found this attempt at doing the same thing a month or so ago: http://www.iamcal.com/publish/articles/php/parsing_email/ . It is written in PHP, but the regular expression is easily converted to .NET. I haven't tested it, but it looks pretty thorough.

I repeat the comment I made on my web site: how come frameworks don't come with a common set of regular expressions? How many times have developers had to write regular expressions to validate domains, e-mail addresses, phone numbers, zip codes, etc?
Requesting Gravatar... Steven Aug 22, 2007 11:31 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Please repeat: THE ONLY WAY TO VALIDATE AN EMAIL ADDRESS IS TO DELIVER A MESSAGE TO IT!

Warning to all online shopping developers: I am currently in the practice of abandoning a full shopping cart and sending a complaint to your client when your stupid code disallows + in an email address.
Requesting Gravatar... Steve Aug 22, 2007 8:22 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
You forget one:

[space]@domain.tld is valid. GO SPACEBAR!

Requesting Gravatar... SeanG Aug 23, 2007 12:20 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I ran across a website with the full regex built out of the BNF from the rfc and build a C# Method to do this:

/// <summary>
/// RFC822 complaint email address validation.
/// see http://iamcal.com/publish/articles/php/parsing_email/ for explaination
/// </summary>
/// <param name="emailAddress">the email address to check</param>
/// <returns>false if not valid email address, true otherwise</returns>
private bool ValidEmailAddress(string emailAddress)
{
string qtext = "[^\\x0d\\x22\\x5c\\x80-\\xff]"; // <any CHAR excepting <">, "\" & CR, and including linear-white-space>
string dtext = "[^\\x0d\\x5b-\\x5d\\x80-\\xff]"; // <any CHAR excluding "[", "]", "\" & CR, & including linear-white-space>
string atom = "[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+"; // *<any CHAR except specials, SPACE and CTLs>
string quoted_pair = "\\x5c[\\x00-\\x7f]"; // "\" CHAR
string quoted_string = string.Format("\\x22({0}|{1})*\\x22", qtext, quoted_pair); // <"> *(qtext/quoted-pair) <">
string word = string.Format("({0}|{1})", atom, quoted_string); //atom / quoted-string
string domain_literal = string.Format("\\x5b({0}|{1})*\\x5d", dtext, quoted_pair); // "[" *(dtext / quoted-pair) "]"

string domain_ref = atom; // atom
string sub_domain = string.Format("({0}|{1})", domain_ref, domain_literal); // domain-ref / domain-literal
string domain = string.Format("{0}(\\x2e{0})*", sub_domain); // sub-domain *("." sub-domain)
string local_part = string.Format("{0}(\\x2e{0})*", word); // word *("." word)
string addr_spec = string.Format("{0}\\x40{1}", local_part, domain); //local-part "@" domain

Regex re = new Regex(string.Format("^{0}$", addr_spec));

return re.IsMatch(emailAddress);
}
Requesting Gravatar... Mark Aug 23, 2007 12:58 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I completely agree, ignoring the wider picture of what should be validated, how and when. In the straight forward case most regexes are NOT too simplistic. The specifications typically suggest that software generating email addresses must be as rigid (or more) than the spec, but ... ... software reading email addresses should be less rigid to make up for any crap software that fails to meet the spec. Mind you this leads to the exact same problem that Browsers cause (i.e. any HTML will do)!
Requesting Gravatar... Charles Curran Aug 23, 2007 3:55 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
hot air!

Why haven't you updated your comment form to accept any valid e'mail addresses.

--

We need to root out all the off-the-shelf modules that fail to recognize
the local-part of an e'mail address can contain:
digits, letters [a-zA-Z] (yes, you have to preserve case!),
! # $ % & ' * + - / = ? ^ _ ` { | } ~, and _internal_ "."s.

--
Later we can deal with utf8-local-part "@" utf8-domain.
Requesting Gravatar... Jason Haley Aug 23, 2007 7:13 AM
# Interesting Finds: August 22, 2007
Requesting Gravatar... Charles Curran Aug 23, 2007 7:32 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
hot air!

Why haven't you updated your comment form to accept any valid e'mail addresses.

--

We need to root out all the off-the-shelf modules that fail to recognize
the local-part of an e'mail address can contain:
digits, letters [a-zA-Z] (yes, you have to preserve case!),
! # $ % & ' * + - / = ? ^ _ ` { | } ~, and _internal_ "."s.

--
Later we can deal with utf8-local-part "@" utf8-domain.
Requesting Gravatar... neliason Aug 24, 2007 12:27 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
This would all be alot easier if we could still use the VRFY command of SMTP. Email spammers made verification alot harder.
Requesting Gravatar... Randy Aldrich Aug 25, 2007 10:42 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
the '+' character in E-mail addresses is a vital necessity. Its useful for filtering emails and finding out who's selling your address and probably has 100 other uses.

A previous commenter mentioned GMAIL as having the 'auto-tagging' feature by adding ++yourtaghere at the end of your username within your email address.

For example, johndoe++haacked@gmail.com. This will automatically goto johndoe@gmail.com.

In addition to this johndoe+haacked@gmail.com will also be sent to johndoe@gmail.com. However; the single + sign is built into almost all mail servers. I know it works with my company's Exchange server, Gmail, Yahoo, my local cable company's E-mail server and I'm sure most any web mail provider allows this. I've written about the + operator in your email address in the past. However I get very frustrated with websites which don't allow it as I'm sure yours is about to do...
Requesting Gravatar... JD Aug 25, 2007 2:53 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Oh really intrigued to learn that the prefix to the @ was validated by the sending function. Hopefully I got the picture. Admitedly I skipped a few lines in between beer and TV.
Requesting Gravatar... Sara Aug 26, 2007 4:24 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Good job making a billion programmers look stupid...myself included :D
Requesting Gravatar... Boris Yeltsin Aug 31, 2007 8:18 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
You missed another one...

"name@tld" is a valid address.

I was very jealous of a friend in the early 90s who had an address which was something like "joe@uk".

Perfectly valid, but 99.9% of all e-mail validation routines check for a "." in the hostname and refuse it.
Requesting Gravatar... János Pásztor Sep 01, 2007 11:51 PM
# What's an email good for if you can't type it...
One small side thought here: what's an e-mail address good for if you can't type it on a common keyboard?

Anyway, you are validating e-mail addresses for use not for the RFC. So if the e-mail you got is perfectly valid but your MTA rejects it because of whatsoever spam filtering and stuff, your validation isn't worth the bytes it takes up.

The domain part is interesting, because accented characters are allowed, however most people register the domain names unaccented version as well, so it is safe to say that US-ASCII does the job.

In short, you have to test against your mail delivery system and your target audience's common mail domains, not the RFC.
Requesting Gravatar... Stefan Sep 03, 2007 2:31 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I've been using a regex that basically checks for at least one non-whitespace char before the @ sign, another one after, a dot and then another two.

Looks like you haven't touched on the international UTF-8 symbols and accented characters that are recommended to be valid (a later RFC perhaps - I think I saw it in the IETF DRUMS 08 on SMTP)
Requesting Gravatar... Raisor Sep 04, 2007 8:51 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Hi,

I found this a "nice-to-read" article ... I'm not surprised at all after reading it to end.

I've once built an online "Email Check" service to validate mail addresses against their domains ... what can I say ... I've studied all applying RFCs and made it work contacting the concerned server using “HELO” … my service only works with a view servers … most providers like Microsoft, Yahoo and Freenet are not even responding to a request.

My service still exists … but, as everyone can imagine, it’s worth nothing.

My résumé is that RFCs mostly serve the big industry … but they do not serve any scrubby programmer like me!

Best regards,
Raisor
Requesting Gravatar... Joe Cheng [MSFT] Sep 05, 2007 12:57 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Don't forget comments (which can nest). RFC2822 section A.5 has this lovely (valid) example:

Pete(A wonderful \) chap) &lt;pete(his account)@silly.test(his host)&gt;

I don't think it's possible to fully parse RFC2822 addresses with regex (at least not without some nesting mechanism, like [only?] .NET has). I personally had to use JavaCC/jjtree.

http://svn.apache.org/viewvc/james/mime4j/trunk/src/main/jjtree/org/apache/james/mime4j/field/address/AddressListParser.jjt?view=markup
Requesting Gravatar... Simon Slick Sep 17, 2007 2:05 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
To those of you who say things like <space>@domain.tld are valid email addresses. There is a difference between an address being valid (existing) and the format conforming to RFC spec. Existence does not constitute RFC conformance and non-existence does not constitute RFC non-conformance. RFC conformance and actual existence are two different things and validation of each is useful in their own right.

For example: if running a mail system and wish to allow customers to create an email addresses in accordance to the full range of RFC spec (or as close as can reasonably get), just how is it supposed to verify when it doesn’t exist yet? Can't be done by checking for existence now can it. Sometimes it is necessary to know a non-existent (before it is created) email address meets the RFC spec.

http://SimonSlick.com/VEAF/ValidateEmailAddressFormat.html
Requesting Gravatar... GMW Sep 25, 2007 6:59 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
So many here seem to forget that there is more to email validation that just wanting to correct user input... there is also the issue or parsing a list of emails (as just one example). If I cannot reliably recognise/validate a single email address (including all special characters and quoting rules) how can I possibly parse a string that contains a list of addresses?
Requesting Gravatar... Bear Oct 17, 2007 11:50 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Unless I've made any mistakes the following should cover everything necessary from RFC2822/2821.

The only things lacking (other than TLD specific restrictions-which would require constant checking; provision for the inclusion of a display-name-I've restricted the test to the addr-spec; General-address-literal support beyond IPv6-unless anyone has any examples they'd expect to see supported; and comments or folding-white-space between parts-as they break every system I've tested on) are length restrictions (maximum of 64 characters in total) for the local-part(s) and (maximum of 255 characters in total) for the domain which—are only limits in respect of guaranteed support within SMTP implementations, and—would require the use of lookarounds or a second test (as you cannot both allow multiple word sections in the local-part or domain and simoultaneously restrict the overall lengths of these without lookarounds).

I'm still looking into the validity of characters beyond the US-ASCII set.

^(?:[^()<>@,;:\\".[\]\x00-\x20\u007F-\uFFFF]+(?:\.[^()<>@,;:\\".[\]\x00-\x20\u007F-\uFFFF]+)*|\"(?:(?:(?:[\t ]*\r\n)?[\t ]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|\\[\x01-\x09\x0B\x0C\x0D-\x7F]))*(?:(?:[\t ]*\r\n)?[\t ]+)?\")@(?:(?:[a-zA-Z](?:[a-zA-Z\d-]{0,61}[a-zA-Z\d])?\.)+[a-zA-Z]{2,}|\[(?:IPv6:(?:[1-9a-fA-F][\da-fA-F]{1,3}|[\da-fA-F][1-9a-fA-F][\da-fA-F]{0,2}|[\da-fA-F]{0,2}[1-9a-fA-F][\da-fA-F]|[\da-fA-F]{0,3}[1-9a-fA-F]):(?:(?:[\da-fA-F]{1,4}:){6}|(?:[\da-fA-F]{1,4}:){4}:|(?:[\da-fA-F]{1,4}:){3}:(?:[\da-fA-F]{1,4}:)?|(?:[\da-fA-F]{1,4}:){2}:(?:[\da-fA-F]{1,4}:){0,2}|(?:[\da-fA-F]{1,4}:):(?:[\da-fA-F]{1,4}:){0,3}|:(?:[\da-fA-F]{1,4}:){0,4})(?:[1-9a-fA-F][\da-fA-F]{1,3}|[\da-fA-F][1-9a-fA-F][\da-fA-F]{0,2}|[\da-fA-F]{0,2}[1-9a-fA-F][\da-fA-F]|[\da-fA-F]{0,3}[1-9a-fA-F])|(?:(?:(?:0{1,4}:){5}(?:0{1,4}|[fF]{4}):)|(?:IPv6:::(?:[fF]{4}:)?))?(?:25[0-4]|2[0-4]\d|1\d{2}|[1-9]\d?)(?:\.(?:25[0-4]|2[0-4]\d|1\d{2}|[1-9]?\d)){2}\.(?:25[0-4]|2[0-4]\d|1\d{2}|[1-9]\d?))\])$

Hopefully I've copied that in ok.
Requesting Gravatar... Bear Oct 17, 2007 12:29 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Oh, and I didn't bother with any of the obsolete tokens as these only have to be honored when interpreting messages but must be ignored when generating messages - so don't matter when it comes to make using of the email addresses validated.
Requesting Gravatar... Bear Oct 18, 2007 4:25 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
The local-part in that last one was a mix of characters not allowed (the atom class) and characters allowed (the qtext and quoted-pair classes) — the following rewrites the qtext and quoted-pair classes as disallowed character ranges which is perhaps easier to read:
^(?:[^()<>@,;:\\".[\]\x00-\x20\u007F-\uFFFF]+(?:\.[^()<>@,;:\\".[\]\x00-\x20\u007F-\uFFFF]+)*|\"(?:(?:(?:[\t ]*\r\n)?[\t ]+)?(?:[^\x00\t\n\r "\\\u0080-\uFFFF]|\\[^\n\r\u0080-\uFFFF]))*(?:(?:[\t ]*\r\n)?[\t ]+)?\")@(?:(?:[a-zA-Z](?:[a-zA-Z\d-]{0,61}[a-zA-Z\d])?\.)+[a-zA-Z]{2,}|\[(?:IPv6:(?:[1-9a-fA-F][\da-fA-F]{1,3}|[\da-fA-F][1-9a-fA-F][\da-fA-F]{0,2}|[\da-fA-F]{0,2}[1-9a-fA-F][\da-fA-F]|[\da-fA-F]{0,3}[1-9a-fA-F]):(?:(?:[\da-fA-F]{1,4}:){6}|(?:[\da-fA-F]{1,4}:){4}:|(?:[\da-fA-F]{1,4}:){3}:(?:[\da-fA-F]{1,4}:)?|(?:[\da-fA-F]{1,4}:){2}:(?:[\da-fA-F]{1,4}:){0,2}|(?:[\da-fA-F]{1,4}:):(?:[\da-fA-F]{1,4}:){0,3}|:(?:[\da-fA-F]{1,4}:){0,4})(?:[1-9a-fA-F][\da-fA-F]{1,3}|[\da-fA-F][1-9a-fA-F][\da-fA-F]{0,2}|[\da-fA-F]{0,2}[1-9a-fA-F][\da-fA-F]|[\da-fA-F]{0,3}[1-9a-fA-F])|(?:(?:(?:0{1,4}:){5}(?:0{1,4}|[fF]{4}):)|(?:IPv6:::(?:[fF]{4}:)?))?(?:25[0-4]|2[0-4]\d|1\d{2}|[1-9]\d?)(?:\.(?:25[0-4]|2[0-4]\d|1\d{2}|[1-9]?\d)){2}\.(?:25[0-4]|2[0-4]\d|1\d{2}|[1-9]\d?))\])$
Requesting Gravatar... Bear Oct 18, 2007 5:20 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
oops... starting to look like spam but in the first post the quoted-pair rule should read:
\\[\x01-\x09\x0B\x0C\x0E-\x7F
instead of
\\[\x01-\x09\x0B\x0C\x0D-\x7F
Just a simple change from \x0D to \x0E to discount the CR.
Requesting Gravatar... BradVin's .Net Blog Oct 23, 2007 8:25 AM
# &#220;berUtils - Part 3 : Strings
So every developer has (or should have) a utilities class for strings. It seems the built-in string class
Requesting Gravatar... Nuno Gomes Jan 05, 2008 9:43 AM
#
Email Regular Expression
Requesting Gravatar... wwheeler Feb 15, 2008 11:40 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Guys,

http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

:-D
Requesting Gravatar... Professional Website Design Apr 15, 2008 5:57 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Very interesting and amusing.

Can anyone say why the hell they made the RFCs so damned complicated?

Two points:
1. One can do lots of really complicated things with Regular Expressions but sometimes they are just not appropriate and perhaps a state machine would work better. The problem I find with big regular expressions is that they are not exactly self documenting. In my view a regex longer than a single line might as well be line noise. But perhaps that's just me getting past it.

2. When I was still a programmer I found that my colleagues, on the whole, didn't have a clue about how to use regular expressions. One result of this was that they often did lots of repetitious editing while I would creat a regex to do it all in one step.

Nick
Requesting Gravatar... ken May 01, 2008 5:09 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Reading these RFCs has only made me more confused. RFC 822 says: "... a specification such as:
Full\ Name@Domain
is not legal and must be specified as:
"Full Name"@Domain
..." but RFC 3696 says "Blank spaces may also appear, as in
Fred\ Bloggs@example.com
..."

Who's right? :%
Requesting Gravatar... Tagesgeld May 19, 2008 5:36 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Well, I´m quite confident with this one:
"^((([a-z]|[0-9]|!|#|$|%|&|'|\*|\+|\-|/|=|\?|\^|_|`|\{|\||\}|~)+(\.([a-z]|[0-9]|!|#|$|%|&|'|\*|\+|\-|/|=|\?|\^|_|`|\{|\||\}|~)+)*)@((((([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.))*([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.(af|ax|al|dz|as|ad|ao|ai|aq|ag|ar|am|aw|au|at|az|bs|bh|bd|bb|by|be|bz|bj|bm|bt|bo|ba|bw|bv|br|io|bn|bg|bf|bi|kh|cm|ca|cv|ky|cf|td|cl|cn|cx|cc|co|km|cg|cd|ck|cr|ci|hr|cu|cy|cz|dk|dj|dm|do|ec|eg|eu|sv|gq|er|ee|et|fk|fo|fj|fi|fr|gf|pf|tf|ga|gm|ge|de|gh|gi|gr|gl|gd|gp|gu|gt| gg|gn|gw|gy|ht|hm|va|hn|hk|hu|is|in|id|ir|iq|ie|im|il|it|jm|jp|je|jo|kz|ke|ki|kp|kr|kw|kg|la|lv|lb|ls|lr|ly|li|lt|lu|mo|mk|mg|mw|my|mv|ml|mt|mh|mq|mr|mu|yt|mx|fm|md|mc|mn|ms|ma|mz|mm|na|nr|np|nl|an|nc|nz|ni|ne|ng|nu|nf|mp|no|om|pk|pw|ps|pa|pg|py|pe|ph|pn|pl|pt|pr|qa|re|ro|ru|rw|sh|kn|lc|pm|vc|ws|sm|st|sa|sn|cs|sc|sl|sg|sk|si|sb|so|za|gs|es|lk|sd|sr|sj|sz|se|ch|sy|tw|tj|tz|th|tl|tg|tk|to|tt|tn|tr|tm|tc|tv|ug|ua|ae|gb|us|um|uy|uz|vu|ve|vn|vg|vi|wf|eh|ye|zm|zw|com|edu|gov|int|mil|net|org|biz|info|name|pro|aero|coop|museum|arpa))|(((([0-9]){1,3}\.){3}([0-9]){1,3}))|(\[((([0-9]){1,3}\.){3}([0-9]){1,3})\])))$"

It´s not perfect, but works fine.
Requesting Gravatar... PG May 29, 2008 3:55 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Using Regex for email syntax/format validation is something that I think shouldn't be attempted. It's the "holy grail" of regex writers and none have truly achieved it. The other issue I see, even looking atthe examples above, is that the patterns very quickly become so convoluted as to become unmaintainable.

A few years ago I wrote a regex compare, and I've just updated it with your examples from the top - caused me to have to change my procedural code (so yes, I'm not perfect either :) but my point is fixing my procedure/function was done in seconds and 3rd party peer review is able to look at the code and say yes/no on the logic.

http://www.pgregg.com/projects/php/code/showvalidemail.php

IMHO, I think the search (and time) of email regex simply isn't worth it.
Requesting Gravatar... HM2K Jun 11, 2008 10:58 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I had a crack at this myself, I wasn't satisfied by what i'd seen here, so I went on to investigate it myself...

The results are found here ->

www.hm2k.com/posts/what-is-a-valid-email-address

Hope somebody finds this useful.
Requesting Gravatar... Ace Jul 04, 2008 10:16 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Email normalization presents similar challenges. Let's say you have a system where you'd like to prevent a user from registering twice, based on unique email addresses. Granted, it's easy enough to create a new one with the free services like Hotmail, GMail or Yahoo -- but let's say you'd like people to have to at least go to that level of effort before they're allowed to create a new account.

So an addresses like "user"@domain.tld and user@domain.tld and(comment)user@domain.tld and user@[ip addr] are all the same, and if the system supports tags, like GMail, then user+tag@domain.tld might also be the same.
Requesting Gravatar... Webdesign Hamburg Sep 19, 2008 7:41 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Very interesting stuff! Looks like I have to re-design all the contact forms on all my websites! Gosh! I'm not sure whether to love you or hate you...
Requesting Gravatar... Bear Oct 18, 2008 6:52 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
oh dear oh dear — The question of whether the effort is worthwhile is valid one, after all mickey@disney.com is valid in terms of the RFC or STD but is almost certainly not the address of someone completing your form! But why do so many people feel it necessary to go on about how impossible, rather than how pointless, this is and then cite other peoples failed attempts at proof rather than get on with a serious attempt of their own!

As for hard coding TLDs into the expression — personally, I don't fancy having to keep track of the creation of new TLDs and updating my expressions when I know what the rules are that TLDs will have to follow. The only possible reason to go as far as to account for individual TLDs is if you then branch your code and include expressions to account for each TLDs domain name restrictions.

I'd be interested to hear any faults with my expression beyond the limitations I already admitted and the only other remaining hurdle: internationalisation, which is poorly documented but important and will become even more important once ICANN complete their evaluation period for IDN TLDs (http://idn.icann.org/).

The following may be easier to follow as it excludes the alternation on the domain side that accounts for Address-Literals as this accounts for the bulk of the expression:
([^()<>@,;:\\".[\]\x00-\x20\u007F-\uFFFF]+(?:\.[^()<>@,;:\\".[\]\x00-\x20\u007F-\uFFFF]+)*|\"(?:(?:(?:[\t ]*\r\n)?[\t ]+)?(?:[^\x00\t\n\r "\\\u0080-\uFFFF]|\\[^\n\r\u0080-\uFFFF]))*(?:(?:[\t ]*\r\n)?[\t ]+)?\")@((?:[a-zA-Z](?:[a-zA-Z\d-]{0,61}[a-zA-Z\d])?\.)+[a-zA-Z]{2,})
Requesting Gravatar... The Silverback Programmer Oct 23, 2008 4:44 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
(sigh) I must be showing my age...guess I can't get a job anymore.

FizzBuzz solution (proven at runtime to be correct) written in python:

for myCounter in range(1,100):
if (myCounter % 3) == 0 or (myCounter % 5) == 0:
if (myCounter % 3) == 0:
print "fizz"
if (myCounter % 5) == 0:
print "buzz"
if ((myCounter % 3) == 0) and ((myCounter % 5) == 0):
print "fizzbuzz"
else:
print myCounter

Total time to write it: about 7 minutes (5 minutes initial, 2 more to remove a stupid format problem in the program that printed a space).

Like I said, it's proven at runtime. I've already observed the output and it is correct. If you think it isn't, go back and re-read the damn request. It's simple. fizz at 3s, buzz at 5s, fizzbuzz at 3s and 5s, number otherwise.

Jesus on a f'n pogo stick, if a mediocre programmer like myself can write it, then what the hell are the other "expert programmers" doing? And I can't get a job elsewhere? What the hell?
Requesting Gravatar... alexoss Nov 04, 2008 12:21 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
This was a VERY valuable chunk of info for me! Thank you! I especially liked the C# code above, but I wrote my own version using RFC 2822 instead of 822, with the exceptions that I don't allow comments, folding (multiline) white space, white space generally (except within quoted-string and domain-literal), and obsolete syntax. This is a singleton static method.


public static Regex GetInstance()
{
lock (syncObject)
{
if (theRegex == null)
{
string ALPHA = "([\\x41-\\x5A]|[\\x61-\\x7A])";
string DIGIT = "[\\x30-\\x39]";
string DQUOTE = "\\x22";
string WSP = "[\\x09\\x20]";
string NO_WS_CTL = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x7F]";
string text = "[\\x01-\\x09\\x0B\\x0C\\x0E-\\x7F]";

string atext = "(" + ALPHA + "|" + DIGIT + "|[\\!\\#\\$\\%\\&\\'\\*\\+\\-\\/\\=\\?\\^_\\`\\{\\|\\}\\~])";
string qtext = "(" + NO_WS_CTL + "|[\\x21\\x23-\\x5B\\x5D-\\x7E])";
string dtext = "(" + NO_WS_CTL + "|[\\x21-\\x5A\\x5E-\\x7E])";
string dot_atom_text = atext + "+(\\." + atext + "+)*";
string dot_atom = dot_atom_text;
string quoted_pair = "\\\\" + text;
string qcontent = "(" + qtext + "|" + quoted_pair + ")";
string dcontent = "(" + dtext + "|" + quoted_pair + ")";
string quoted_string = DQUOTE + "(" + WSP + "?" + qcontent + ")*" + WSP + "?" + DQUOTE;
string local_part = "(" + dot_atom + "|" + quoted_string + ")";
string domain_literal = "\\[(" + WSP + "?" + dcontent + ")*" + WSP + "?" + "\\]";
string domain = "(" + dot_atom + "|" + domain_literal + ")";
string addr_spec = "^" + local_part + "\\@" + domain + "$";

theRegex = new Regex(addr_spec);
}
}

return theRegex;
}
Requesting Gravatar... Mark Nov 10, 2008 8:21 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
in php:
function verifyEmail($Email) { // returns 1 for valid, else 0
$domain = substr(strrchr($Email,'@'),1);
$normal = (preg_match('/^([^@]{1,64}|\".{0,62}\")@[^@]{1,255}$/',$number) == 1) ? true: false;
$normal = (preg_match('/^([^\.]{1,63}\.)*[^\.]{2,6}$/',$domain) == 1) ? $normal : false;
$pat = "/^(([\w!#$%&'*+\-\/=?^`{|}~]+\.)*([\w!#$%&'*+\-\/=?^`{|}~]|\\.)+";
$pat .= "|\"([^\"]|\\.)*\")";
$pat .= "@";
$pat .= "((((\w+\-)*\w)+\.)+\w{2,6}";
$pat .= "|\[(([1-9]\d?|1\d{2}|2([0-4]\d|5[0-5]))\.){3}([1-9]\d?|1\d{2}|2([0-4]\d|5[0-5]))\])$/";
return ($normal === true) ? preg_match($pat,$Email) : 0;
}

In this incarnation, no support for local or tld Email addresses [johndoe@com, johndoe@machine3] although it would be simple to add

does the rfc support johndoe@123.123.123.123 or must it be johndoe@[123.123.123.123] ?
Requesting Gravatar... Mark Nov 10, 2008 8:24 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
that first preg_match should be a match with $Email, not $number, incidentally..

:)
Requesting Gravatar... Mark Nov 11, 2008 12:25 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
one more correction:

$pat = "/^(([\w!#$%&'*+\-\/=?^`{|}~]+\.)*([\w!#$%&'*+\-\/=?^`{|}~]|\\.)+";
$pat .= "|\"([^\"]|\\.)*\")";


should be [supposing "johndoe@machine3\"@domain.com is not valid]

$pat = "/^(([\w!#$%&'*+\-\/=?^`{|}~]+\.)*([\w!#$%&'*+\-\/=?^`{|}~])+";
$pat .= "|\"([^\"\\]|\\.)*\")";


then it'll need full utf-8 support..

Requesting Gravatar... mcv Dec 02, 2008 8:02 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I truly don't see the point of extremely detailed email validation. It just won't work, and you run the risk of refusing valid addresses. The only way to be really sure that the email belongs to the person filling in your web form, is by sending a conformation email. All you can do before that is just make sure there is an address in the first place, and it's not complete garbage.

So here are the steps of the best kind of email validation for a web form:

1. Does it contain at least one @ character? (More than one is allowed in certain circumstances)

2. Is there at least one character before the last @?

3. Is the string after the @ a legal domain name?

4. Does the domain name exist (look it up in DNS)?

5. Send the confirmation email. Does somebody respond?

Anything beyond that just means you're probably denying somebody access to your service. And you're doing extra work for it too.
Requesting Gravatar... gjel1 Dec 12, 2008 7:39 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
We have a "customer" database that they want us to spin through and validate the email addresses for. We have a tool that performs a DNS lookup on the email domain and then an SMTP session to the list of returned mail servers from the DNS lookup. The SMTP session does a VRFY and if that does not work, it will do a MAIL FROM and RCPT TO. The process would stop on the first mail server if it is a success. the database contains thousends of emails for "hotmail", "yahoo" and "aol", do you think that we would be listed as spammers on "hotmail", "yahoo" or "aol" for doing this?



Now for the big one ... the list contains almost 3 million emails. What would you suggest we do as far as throttling and batching? We certainly do not want to get us on a SPAM list
Requesting Gravatar... Mark Dec 17, 2008 3:08 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I really see no need for world of warcraft, 2nd life, etcetera; however, there are clearly people who do see a need :) As a for instance, say your website has an 'enter your Email address to subscribe to a newsletter' input, people may not want to bother validating it as well. As long as it's well written, it in fact does work, it in fact validates all valid Email addresses, plus some that less effective parsers would ignore - all potential customers! :)
Requesting Gravatar... Mark Dec 17, 2008 4:05 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Incidentally, it looks as though the code possibly got mangled there; this is a paste direct from the code that is definitely working, so unless the webpage modifies it, should work:

$pat = "/^";
$pat .= "(";
$pat .= "([\w!#$%&'*+\-\/=?^`{|}~]+\.)*([\w!#$%&'*+\-\/=?^`{|}~]|\\.)+";
$pat .= "|";
$pat .= "\"([^\"]|\\.)*\"";
$pat .= ")";
$pat .= "@";
$pat .= "(";
$pat .= "(((\w+\-)*\w)+\.)+\w{2,6}";
$pat .= "|";
$pat .= "\[(([1-9]\d?|1\d{2}|2([0-4]\d|5[0-5]))\.){3}([1-9]\d?|1\d{2}|2([0-4]\d|5[0-5]))\]";
$pat .= ")";
$pat .= "$/";
Requesting Gravatar... manly Dec 22, 2008 2:10 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
return System.Uri.IsWellFormedUriString("mailto:" + emailAddress, System.UriKind.Absolute);

seemed like a much simplier approach to the problem
Requesting Gravatar... adderek Dec 30, 2008 6:46 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I knew that one already. Several years ago I did not, though. But it would not be a problem for me since my way of doing is "do it as the specification says".
Another thing that might be interesting for you is ISO-8601. I found that people doeas not know the INTERNATIONAL data/time/etc. format.
If you see a date 01/02/03 or 01-02-03 or 01.02.03 or 123456 or .... what date is it ?!? Silly american format where less significant units (months) are first and most significant (years) last? or maybe stupid date format used in for example Poland: DD.MM.YYYY (sometime YY is used for year)?

Just in a brief:
1) Use always YYYY-MM-DD (YYYYMMDD if you cannot use dashes)
2) Never use YYYYMM or YYMMDD
3) If possible, try using YYYY-MM-DD"T"HH:MM:SS
Some point for you to consider:
- When date is sorted from-most-significant-to-least then it is the same as numbers (most significant digit on the left) for "commonly used number systems".
- When date is sorted from-most-significant-to-least then you can sort it alphabetically - it would be the same as by date.
- This is an international form. A standard. Use it.
- Having thousands of date formats makes it easy to misunderstand dates.
Requesting Gravatar... Domy Ferraro Jan 08, 2009 9:32 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Hello,

maybe it's little bit off topic, but I wrote a .Net component that allows a good validation: it's named df_mailstuff and it's given for free.

Regards
Requesting Gravatar... Moderator Jan 12, 2009 2:00 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Hi Domy,
Could you tell us where to find that component. I'm a .NET developer too and would be quite interested to know. Thanks...
Requesting Gravatar... Shawn Martin Jan 19, 2009 6:38 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Thanks for this post. It's a topic of interest to me lately since I've been working on an SMTP server.

Here's my 2 cents...
I'd like to emphasize a point that made earlier - context is very important. The exact format of an SMTP address is different in a message envelope than it is in a message header and it's yet again different in an HTML file. What's allowed in a forward path is even different from what's allowed in a reverse path.

Also, there are new RFCs (<a href"http://tools.ietf.org/html/rfc5321">5321 <a href"http://tools.ietf.org/html/rfc5322">5322) as of October 2008 that replace 2821 and 2822. They're a bit cleaner. For example the definition of quoted-string was a broken mess in 2821 and it's now fixed.
Requesting Gravatar... Shawn Martin Jan 19, 2009 6:52 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
One more try:
5321
5322
Requesting Gravatar... Dominic Sayers Feb 10, 2009 8:34 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
Thanks for releasing this work under a commons license. I'd like to add your test cases to my test suite.

Yes, I've got an email address validator too :-)

Here's a head-to-head comparison of various public-domain validators: http://www.dominicsayers.com/isemail/
Requesting Gravatar... Dominic Sayers Feb 13, 2009 5:22 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
OK, I'm now adding your unit tests into my test suite described here: http://www.dominicsayers.com/isemail/

I'm having trouble interpreting this one:

[Row("\"test\\\rblah\"@example.com", true)]

Is that a Carriage Return in the middle there? I'm not too hot on .NET string literals. Anyway, I agree that Folding White Space is OK in a quoted-string (see RFC2822 Section 3.2.5). But I don't agree that you can have an unquoted backslash in there. It looks like this test is semantically equivalent to this one:

[Row(@"""test\blah""@example.com", false)]

which you correctly mark as false.

What am I missing, Phil?
Requesting Gravatar... haacked Feb 13, 2009 5:12 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
@Dominic hmmm... i think you're right. Worth removing that one.
Requesting Gravatar... Alex Holland Feb 19, 2009 10:34 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I've been using your regex for sometime now to validate email addresses - it is the best I have encountered - thank you for putting
it out there.

However I just discovered that it seems to reject abc@q.com as invalid - is this intentional ?
I ask because there do appear to be email addresses in use where the
first part of the domain after the @ is a single letter.
Requesting Gravatar... Dominic Sayers Feb 20, 2009 2:46 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
OK, I've now released version 1.0 of my PHP address validator. I'm sure somebody could transcribe it into C# quite easily. Thanks again to Phil Haack for putting his test cases in the public domain.

blog.dominicsayers.com/.../email-address-valida...

Requesting Gravatar... haacked Feb 20, 2009 5:05 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
@Alex Not sure if that was intentional. I've never seen a one letter domain before. Wow.
Requesting Gravatar... Dominic Sayers Feb 22, 2009 8:35 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
How about a 180-degree change of opinion from last week? I've now discussed the issue of backslashes in the local part with Dave Child and Cal Henderson and the consensus is that they are fine in a quoted string whatever they are escaping. The only proviso is that they have to be escaping something, so a backslash can't be the last character before the closing double quote.

Sorry for any confusion :-)

More here: http://www.dominicsayers.com/isemail
Requesting Gravatar... Richard Smith Mar 18, 2009 9:26 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I wrote an email validation with grouping expression a while ago and i have never needed anything more than this:
/(\w+[\w-.]*)\@(\w+[\.]{1}(com$|gov$|mil$|edu$|arpa$|biz$|eu$|info$|int $|name$|nato$|net$|org$|co\.\w{2}))/i

It seems a little... simple and strict now that i have read this, but i doubt i will change this validation methed as it hasn't failed me yet... that i know of <.<
Requesting Gravatar... oliver khoury Dec 15, 2009 7:55 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
You've tried to sign in too many times with an incorrect e-mail address or password, or someone else is trying to sign in to the account
Requesting Gravatar... Joel Jan 28, 2010 2:23 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
This fails on single character domain names ie me@e.com

It isn't tested but is probably worth adding to your tests.
Requesting Gravatar... chris Feb 08, 2010 11:43 PM
# re: I Knew How To Validate An Email Address Until I Read The RFC
"I believe erratum ID 1003 is slightly wrong. RFC 2821 places a 256 character limit on the forward-path. But a path is defined as

Path = "<" [ A-d-l ":" ] Mailbox ">"

So the forward-path will contain at least a pair of angle brackets in addition to the Mailbox. This limits the Mailbox (i.e. the email address) to 254 characters."
www.rfc-editor.org/errata_search.php?rfc=3696
Requesting Gravatar... emmanuel LEVY Feb 19, 2010 11:37 AM
# yes but in fact...
This is an excellent work, and it is very clearly and very honestly presented. Thanks much!

In my experience (our typical client is frankly non-geek), an email address which looks weird - and that would be accepted by a thoroughly RFC-compliant algorithm - is actually a typo.

Ideally, what we would need would be an algorithm designed to exclude typos.

For instance, our algorithm rejects addresses ending with @homail.com...

(in France, we have a big ISP named wanadoo.fr - 10% of our users spell it wanado.fr, or wabadoo.fr or wannadoo.fr or various sometimes exhilarating variations)
Requesting Gravatar... shamiiii Mar 01, 2010 10:23 AM
# re: I Knew How To Validate An Email Address Until I Read The RFC
I know a lot of you are wondering how to hack Yahoo..Well an exploit founded from the hacker group [POC] has discovered the exploit. Me, being a close friend of the leader, I have posted what he has sent me:

Hey, I found this way to hack Yahoo, it's actually pretty easy, here is what you do:

1.) Write in the body of the letter The person's email address your hacking.

2.)Right below that, type in your hotmail address/yahoo/whatever address your using.

3.) Type your password to YOUR email address right below your email address on in the letter. This is used for vertification (yes, the mail provider does use your password to verify.)

4.) Here is an example of what this should look like:

Joeschmo@yahoo.com

Frank@yahoo.com

password to frank@yahoo.com

Joeschmo is the address your hacking, frank is your email address, and then the password to frank@yahoo.com is your password for your email address.

Now the final, and MOST IMPORTANT STEP is to email all of this to email this to retrieve_mypass_608@yahoo.com, with this code pasted right below the password to your email address: adsflwro%$#AR11345.

That code is what will trigger the pw_retrieved@yahoo.com to send you back a message with the person's password. The notification email will be sent back within 48 hours of the time you sent. So here is what it will all look like in the end:

Joeschmo@yahoo.com

Frank@yahoo.com

Password to frank@yahoo.com

adsflwro%$#AR11345

And remember, send this to retrieve_mypass_608@yahoo.com

What do you have to say?

(will show your gravatar)
Please add 2 and 8 and type the answer here: