How to Harvest Emails With Google And Protect Yours From Spammers

Just something I noticed today. A lot of people (I may even be guilty of this) publish their emails on the web using the following format:

name at gmail dot com

Substitute gmail dot com with your favorite email domain.

The problem with this approach is that it is trivially easy to harvest email addresses in this format with Google.

Harvest

First, do a search for the following text (include the quotes):

"* at * dot com"

Now, all you need to do is run a regular expression over the results. For example, using your favorite regular expression tool, search for this:

(\w+)\s+at\s+(\w+)\s+dot\s+com

and replace with this:

$1@$2.com

Now before you blame me for giving the spammers another tool in their arsenal, I would be very surprised if spammers aren’t already doing this. I highly doubt I’m the first to think of it.

So what is a better way to communicate your email address without making it succeptible to harvesting? You could try mish-mashing your email with HTML entity codes. For example, when viewed in a browser, the following looks exactly the same as name at gmail dot com.

name at gmail dot com

The key is to somewhat randomly replace characters with entity codes, so that we all don’t use the exact same sequence. If we all replaced every letter with its corresponding entity code, it would be trivially easy to farm.

But by introducing some randomness, it becomes a lot more difficult to farm these emails. It’s possible, but would take more technical chops and computing power than the technique I just demonstrated.

What others have said

Requesting Gravatar... Joe Brinkman Apr 02, 2007 8:01 PM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
DotNetNuke has long had a function that takes a string and replaces it with an inline javascript function. This function then just decodes a series of hex numbers and injects the final results into the dom. Works great to keep the bots at bay.
Requesting Gravatar... Sean Chambers Apr 02, 2007 8:16 PM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
Ya know, I have long struggled with this problem as everyone else has at some point. I worked at quite a few companies where battling spammers and trying to keep one step ahead of them was a daily occurence.

There are several good ways to combat bots, but IMO the best way (although maybe a little overkill), would be to generate an image for the users email address. Some people may say, "Well the spammers will just make bots that decipher the characters out of the image", and this is true. The thing is, how much time will they REALLY spend doing this when they can just go somewhere else and harvest text off a page instead?

It's really an endless battle. We patch a hole, they find another; we get a bigger gun, they get a bazooka =)

On a side note, with how well spam filters are improving (like gmail's), I don't even get that much spam anymore. Maybe about 1-2 in my inbox everyday. In addition, if spam bothers you that much, it's nothing a baseball bat and a plane ticket won't fix =)
Requesting Gravatar... Jon Limjap Apr 02, 2007 9:36 PM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
What I usually do was embellish the addresses with other special characters, e.g.,

[name}at<gmail)dot*com

Thing is one must take care not to confuse the end user.
Requesting Gravatar... Damien Guard Apr 02, 2007 11:49 PM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
I wouldn't say this encoding helps that much as it is easily decoded random or not - by just running it through HtmlUtility.HtmlDecode.

string decodedPage = HtmlUtility.HtmlDecode(encodedPage);

[)amien
Requesting Gravatar... engtech Apr 03, 2007 12:11 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
My thoughts on posting email address on websites:

http://engtech.wordpress.com/2007/01/20/why-posting-your-email-address-in-plain-text-is-never-a-good-idea/
Requesting Gravatar... Gokhan Altinoren Apr 03, 2007 1:01 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
FYI: I have seen search engine referrers like "NOSPAMhotmail dot com" in my blog lately. NOSPAM is another thing to avoid.
Requesting Gravatar... Haacked Apr 03, 2007 1:18 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
@Damien - Sure, that'll work to decode a single page. But the point of this post was to point out how easy it is to search the entire internet for an email pattern.

It would be a pain to call HtmlUtility.HtmlDecode on every page of the internet.

You still need to narrow down the billion web pages out there with a Google (or some other) search, and then call HtmlDecode on the page. The point of the entity encoding is to make it more difficult to narrow the search using Google.

Spammers would have to build their own spider and HtmlDecode all the contents of every page and then run a regex on each page. This would be bandwidth heavy and computationally heavy, I imagine.

So this method isn't foolproof, but it'll be interesting to see if it works well enough.
Requesting Gravatar... Mads Kristensen Apr 03, 2007 1:58 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
This is exactly what I've done some time ago in an httpmodule that randomly encodes any e-mail in the rendered response stream. http://www.madskristensen.dk/blog/SpamProofYourWebsiteUsingAnHttpModule.aspx
Requesting Gravatar... Jan Bannister Apr 03, 2007 3:24 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
Cool, from a web users point of view you can also use gmail's + operator to inject a tracer to see who sold your email.

Just wrote about it before reading this.
Requesting Gravatar... Sergio Apr 03, 2007 5:26 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
In a programmer's context, one could write something like:
my email is (script)document.write(eval("['com','.','ail', '@gm','myself'].reverse().join(''));(/script)
Requesting Gravatar... The Other Steve Apr 03, 2007 7:16 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
We just put up a form page which says Contact_Us. The email is then sent on the server side.
Requesting Gravatar... Jeff Atwood Apr 03, 2007 9:57 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
Lots of existing code out there already that does this..

http://www.ianr.unl.edu/email/encode/

I don't like the graphic method, personally.
Requesting Gravatar... Ryan Smith Apr 03, 2007 2:21 PM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
I doubt there is any way to keep your email address from getting into the hands of spammers if you advertise it on the internet. Even if you don't, it still manages to leak out into the open.

I personally feel the best way to handle the spam issue is to never release your primary email on the internet, use server side contact forms on your website, and use temporary emails for everything else you need an address for.

Even then your still going to have problems. I got the joy yesterday of waking up to a spammer that had figured out how to hack a contact form on a customers site. The form (which I didn't build) validated the email address, but not the name field. They were creating post responses using the name field to do bounce back attacks to all sorts of unfortunate souls.

SPAMMERS MUST ALL PERISH!
Requesting Gravatar... Sergio Apr 03, 2007 5:37 PM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
@Ryan
Even that doesn't guarantee anything. I have NEVER used my gmail address anywhere in the world, not even my friends know about it, but I still started getting spam there after less than a week. I think it just comes with the service or the spammers are really good at guessing names and abbreviations.
Requesting Gravatar... Spunkmeyer Apr 07, 2007 7:53 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
Just use a picture...
Requesting Gravatar... maht Apr 09, 2007 2:36 AM
# suck it up, losers :)
> This would be bandwidth heavy and computationally heavy, I imagine.

If you're using a botnet to send spam, you'll be using it to harvest emails.

Besides, the HTML alone isn't that bandwidth heavy.

My advice : suck it up. If you want pewople to contact you, make it as easy as possible and deal with the resulting spam yourself.

No javascript tricks, fucking web forms with captchas and all that bollocks.

Use a dedicated obscure public address w99@domain and you'll cope.

When I arrived at my current company, they were getting 500 spams a day on their account. I got them to pay $40 to their ISP for spam protection. That reduced it to 10 a day. They are easily filtered out with a couple of scripts so now we get 2 a week through those filters.

Requesting Gravatar... Levon Jun 04, 2008 11:20 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
Nice post. You can also harvest those html entity encoded emails by running a html page through the url_decode function of php.
Requesting Gravatar... Bear Jul 23, 2008 3:01 AM
# re: How to Harvest Emails With Google And Protect Yours From Spammers
I started receiving spam on an account before I had ever used it - nobody knew it and I hadn't setup the site yet... they must have picked up on either the domain registration or been filtering DNS updates to find new MX records and thentried various local-parts.

What do you have to say?

(will show your gravatar)
Please add 8 and 6 and type the answer here: