Another Attempt To Reduce Comment Span

0 comments suggest edit

SpamA little while back, I had a few ideas about how to combat comment spam. My ideas were more geared towards a trust-based approach to stopping comment graffiti than spam, but they were a bit naive in some ways.

Lately, I’ve been following some conversations on various blogs attempting to address this problem. Dave Winer suggest that comments expire unless the owner does something about it.

Phil Ringnalda responds that he doesn’t want the comments to ever get indexed. This problem seems likely solved by this suggestion in From The Orient that notes that simply stripping the links out of the text themselves will make sure Google doesn’t index it.

As Derek Powazek points out, it is Google’s voracious appetite for indexing pages that is the root motivation for people to comment spam a blog. One question I have about all this is doesn’t Google honor the the robots.txt file or the META tag standard for excluding robots? Adding the following tag:

<META NAME="ROBOTS" CONTENT="NOFOLLOW">

tells Google not to index the links on the given page. Another option is to add a Robots.txt file and tell Google not to index your archives. Personally, I think this second option is too draconian. I think it’s great that people find my blog when they search on how to select random records from SQL Server.

Perhaps what is needed is for us to get together and extend the Robots.txt standard and then push for Google to honor it. Now, I don’t know exactly how Google indexes a website. I don’t know if it parses it as an HTML tree, but supposing it does. It’d be great to have this ability.

<DIV noindex="false" nofollow="true">
Welcome to the comments section of this page.The content here will be indexed, but the links will not.Your spam's no good here. 
DIV> 

Another option is to just have a comment that indicates everything AFTER the comment should not be indexed:

This is easier for an web crawler to parse.

Combining this with an image verification system (like the one that comes with the ASP.NET resource kit from SAX), hopefully lowers the real motivation to comment spam a site. If it doesn’t increase their page rank AND they can’t automate posting it, why bother?

Another crazy idea I’ll mention (and I know this will bog down the server a bit) is to use a component that converts text to an image. That way by default, the entire comment will not be indexed. Just thought I’d throw that out there.

Found a typo or error? Suggest an edit! If accepted, your contribution is listed automatically here.

Comments

avatar

One response

  1. Avatar for Phil Ringnalda
    Phil Ringnalda July 6th, 2004

    It's not so much that I don't want comments indexed (that's easy, we had it for years when we did comments only in a Javascript-spawned popup), but rather that I don't want links followed until I say they can be followed. There have been suggestions for what amounts to a "nofollow='true'" attribute for links, but between the fact that Google has never mentioned liking the idea (in fact, they hate anything that smells like them getting a different view of things than readers) and the fact that HTML (and even XHTML 1.1) is a dead language, which will never get anything new added to it, it's not likely to happen. Which leaves us with the slightly annoying option of having blogging software accept comments with links, but only display the URIs as text until the comment is approved.