How to Harvest Emails With Google And Protect Yours From Spammers

code, tech 0 comments suggest edit

Just something I noticed today. A lot of people (I may even be guilty of this) publish their emails on the web using the following format:

name at gmail dot com

Substitute gmail dot com with your favorite email domain.

The problem with this approach is that it is trivially easy to harvest email addresses in this format with Google.

Harvest

First, do a search for the following text (include the quotes):

”* at * dot com”

Now, all you need to do is run a regular expression over the results. For example, using your favorite regular expression tool, search for this:

(\w+)\s+at\s+(\w+)\s+dot\s+com

and replace with this:

$1@$2.com

Now before you blame me for giving the spammers another tool in their arsenal, I would be very surprised if spammers aren’t already doing this. I highly doubt I’m the first to think of it.

So what is a better way to communicate your email address without making it succeptible to harvesting? You could try mish-mashing your email with HTML entity codes. For example, when viewed in a browser, the following looks exactly the same as name at gmail dot com.

name at gmail dot com

The key is to somewhat randomly replace characters with entity codes, so that we all don’t use the exact same sequence. If we all replaced every letter with its corresponding entity code, it would be trivially easy to farm.

But by introducing some randomness, it becomes a lot more difficult to farm these emails. It’s possible, but would take more technical chops and computing power than the technique I just demonstrated.

Found a typo or error? Suggest an edit! If accepted, your contribution is listed automatically here.

Comments

avatar

20 responses

  1. Avatar for Joe Brinkman
    Joe Brinkman April 2nd, 2007

    DotNetNuke has long had a function that takes a string and replaces it with an inline javascript function. This function then just decodes a series of hex numbers and injects the final results into the dom. Works great to keep the bots at bay.

  2. Avatar for Sean Chambers
    Sean Chambers April 2nd, 2007

    Ya know, I have long struggled with this problem as everyone else has at some point. I worked at quite a few companies where battling spammers and trying to keep one step ahead of them was a daily occurence.
    There are several good ways to combat bots, but IMO the best way (although maybe a little overkill), would be to generate an image for the users email address. Some people may say, "Well the spammers will just make bots that decipher the characters out of the image", and this is true. The thing is, how much time will they REALLY spend doing this when they can just go somewhere else and harvest text off a page instead?
    It's really an endless battle. We patch a hole, they find another; we get a bigger gun, they get a bazooka =)
    On a side note, with how well spam filters are improving (like gmail's), I don't even get that much spam anymore. Maybe about 1-2 in my inbox everyday. In addition, if spam bothers you that much, it's nothing a baseball bat and a plane ticket won't fix =)

  3. Avatar for Jon Limjap
    Jon Limjap April 2nd, 2007

    What I usually do was embellish the addresses with other special characters, e.g.,
    [name}at<gmail)dot*com
    Thing is one must take care not to confuse the end user.

  4. Avatar for Damien Guard
    Damien Guard April 2nd, 2007

    I wouldn't say this encoding helps that much as it is easily decoded random or not - by just running it through HtmlUtility.HtmlDecode.
    string decodedPage = HtmlUtility.HtmlDecode(encodedPage);
    [)amien

  5. Avatar for engtech
    engtech April 2nd, 2007

    My thoughts on posting email address on websites:
    http://engtech.wordpress.co...

  6. Avatar for Gokhan Altinoren
    Gokhan Altinoren April 2nd, 2007

    FYI: I have seen search engine referrers like "NOSPAMhotmail dot com" in my blog lately. NOSPAM is another thing to avoid.

  7. Avatar for Haacked
    Haacked April 2nd, 2007

    @Damien - Sure, that'll work to decode a single page. But the point of this post was to point out how easy it is to search the entire internet for an email pattern.
    It would be a pain to call HtmlUtility.HtmlDecode on every page of the internet.
    You still need to narrow down the billion web pages out there with a Google (or some other) search, and then call HtmlDecode on the page. The point of the entity encoding is to make it more difficult to narrow the search using Google.
    Spammers would have to build their own spider and HtmlDecode all the contents of every page and then run a regex on each page. This would be bandwidth heavy and computationally heavy, I imagine.
    So this method isn't foolproof, but it'll be interesting to see if it works well enough.

  8. Avatar for Mads Kristensen
    Mads Kristensen April 2nd, 2007

    This is exactly what I've done some time ago in an httpmodule that randomly encodes any e-mail in the rendered response stream. http://www.madskristensen.d...

  9. Avatar for Jan Bannister
    Jan Bannister April 2nd, 2007

    Cool, from a web users point of view you can also use gmail's + operator to inject a tracer to see who sold your email.
    Just wrote about it before reading this.

  10. Avatar for Sergio
    Sergio April 2nd, 2007

    In a programmer's context, one could write something like:
    my email is (script)document.write(eval("['com','.','ail', '@gm','myself'].reverse().join(''));(/script)

  11. Avatar for The Other Steve
    The Other Steve April 2nd, 2007

    We just put up a form page which says Contact_Us. The email is then sent on the server side.

  12. Avatar for Jeff Atwood
    Jeff Atwood April 2nd, 2007

    Lots of existing code out there already that does this..
    http://www.ianr.unl.edu/ema...
    I don't like the graphic method, personally.

  13. Avatar for Ryan Smith
    Ryan Smith April 3rd, 2007

    I doubt there is any way to keep your email address from getting into the hands of spammers if you advertise it on the internet. Even if you don't, it still manages to leak out into the open.
    I personally feel the best way to handle the spam issue is to never release your primary email on the internet, use server side contact forms on your website, and use temporary emails for everything else you need an address for.
    Even then your still going to have problems. I got the joy yesterday of waking up to a spammer that had figured out how to hack a contact form on a customers site. The form (which I didn't build) validated the email address, but not the name field. They were creating post responses using the name field to do bounce back attacks to all sorts of unfortunate souls.
    SPAMMERS MUST ALL PERISH!

  14. Avatar for Sergio
    Sergio April 3rd, 2007

    @Ryan
    Even that doesn't guarantee anything. I have NEVER used my gmail address anywhere in the world, not even my friends know about it, but I still started getting spam there after less than a week. I think it just comes with the service or the spammers are really good at guessing names and abbreviations.

  15. Avatar for Spunkmeyer
    Spunkmeyer April 6th, 2007

    Just use a picture...

  16. Avatar for maht
    maht April 8th, 2007

    > This would be bandwidth heavy and computationally heavy, I imagine.
    If you're using a botnet to send spam, you'll be using it to harvest emails.
    Besides, the HTML alone isn't that bandwidth heavy.
    My advice : suck it up. If you want pewople to contact you, make it as easy as possible and deal with the resulting spam yourself.
    No javascript tricks, fucking web forms with captchas and all that bollocks.
    Use a dedicated obscure public address w99@domain and you'll cope.
    When I arrived at my current company, they were getting 500 spams a day on their account. I got them to pay $40 to their ISP for spam protection. That reduced it to 10 a day. They are easily filtered out with a couple of scripts so now we get 2 a week through those filters.

  17. Avatar for Levon
    Levon June 4th, 2008

    Nice post. You can also harvest those html entity encoded emails by running a html page through the url_decode function of php.

  18. Avatar for Bear
    Bear July 22nd, 2008

    I started receiving spam on an account before I had ever used it - nobody knew it and I hadn't setup the site yet... they must have picked up on either the domain registration or been filtering DNS updates to find new MX records and thentried various local-parts.

  19. Avatar for fdas
    fdas December 28th, 2009

    hi
    i am a spammer

  20. Avatar for stive
    stive March 12th, 2011

    oie, você deviam aprender que as pessoas adoram revirar o lixo seus ignorantes, as pessoas fazem spam pq dá certo e as pessoas compram.
    nunca vai acabar pq as pessoas gostam.
    amadores ignorantes valeu.
    Stive