Cleanup The Crap That Windows Live Writer Injects With This HttpModule

First, let me start off with some praise. I really really like Windows Live Writer. I’ve praised it many times on my blog. However, there is one thing that really annoys me about WLW, it’s utter disregard for web standards and the fact that injects crap I don’t want or need into my content.

Of particular annoyance is the way that WLW adds attributes that are not XHTML compliant. For example, when you use the Insert Tags feature, it creates a div that looks something like:

<div class="wlWriterEditableSmartContent" 
  id="guid1:guid2" 
  contenteditable="false" 
  style="padding-right: 0px; display: inline; padding-left: 0px; 
  padding-bottom: 0px; margin: 0px; padding-top: 0px">

What’s the problem? Let me explain. 

  1. First of all, the ID is a GUID that starts with a number. Unfortunately XHTML doesn’t allow the id of an element to start with a number.
  2. The contenteditable attribute is not recognized in XHTML.
  3. The style tag is superfluous and unnecessary. At the very least, it should have been reduced to style="padding:0; display: inline;"

The purpose of the special class and the contenteditable attribute is to inform WLW that the html tag is editable. In the Web Layout view (F11), you can see a hashed box around the tags like so.

image

Clicking on the box changes the right menu to let you enter tags.

image

Because I actually care about web standards and being XHTML compliant and I’m totally anal, I’ve always gone in and manually changed the HTML after the fact.

Today, out of pure laziness and getting fed up with this extra work I have to do, I decided to write an HttpModule to do this repetitive task for me via a Request Filter. A Request Filter modifies the incoming request.

But to make things interesting, I made sure that the HttpModule makes the changes in an intelligent manner so that no information is lost. Rather than simply removing the cruft, I moved the cruft into the class attribute. Thus the HTML I showed above would be transformed into this:

<div class="wlWriterEditableSmartContent id-guid1:guid2 
  contenteditable-false">

Notice that I simply removed the style tag because I don’t need it.

I also created a Response Filter to modify the outgoing response when the client is Windows Live Writer. That allows the module to convert the above html back into the format that WLW expects. In that manner, I don’t break any WLW functionality.

Other Cool Cleanups

Since I was already writing this module, I decided to make it clean up a few other annoyances.

  • Replaces a single &nbsp; between two words with a space. So this&nbsp;is&nbsp;cool gets converted to this is cool.
  • Replaces <p>&nbsp;</p> with an empty string.
  • Replaces an apostophre within a word with a typoghraphical single quote. So you can’t say that becomes you can&#8217;t say that.
  • Replaces atomicselection="true"with an empty string. I don’t re-insert this attribute back into the content yet, as I’m not sure if it is even necessary.

Try it out!

This module should work with any ASP.NET blog engine that uses the MetaWeblog API. It only responds to requests made by Windows Live Writer, so it shouldn’t interfere with anything else you may use to post to your blog.

To use it is as easy as dropping the assembly in the bin directory and modifying your web.config to add the following to the httpModules section:

<httpModules>
  <add type="HtmlScrubber.WLWCleanupModule, HtmlScrubber" 
    name="HtmlScrubber" />
</httpModules>

I’m also including the source code and unit tests, so feel free to give it a try. Please understand that this is something I hacked together in a day, so it may be a bit rough around the edges and I give no warranty. Having saidthat, I’m pretty confident it won’t screw up your HTML.

I have plans to add other features and cleanups in the future. For example, it wouldn’t be hard to add a configuration section that allows one to specify other regular expressions and replacement patterns to apply.

If you have any "cleanups" I should include, please let me know. If you’re reading this post, then you know the module worked.

[Download Binaries] [Download Source]

I thought about adding this to CodePlex, but I’m hoping that the next version of Windows Live Writer makes this module irrelevant. I’m not holding my breath on that one though.

Technorati tags: , ,

What others have said

Requesting Gravatar... Haacked Jul 29, 2007 11:18 PM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
The fact that this page validates is a good sign the module is working. :)
Requesting Gravatar... Joe Cheng [MSFT] Jul 30, 2007 2:40 AM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
Replaces with an empty string. I don’t re-insert this attribute back into the content yet, as I’m not sure if it is even necessary.


Replaces what with an empty string?



We've definitely heard the complaints about generating non-validating markup, and we're hoping to clean up a lot of that stuff in a future release.



Always cool to see a programmer taking matters into his own hands though!

Requesting Gravatar... Jim Holmes Jul 30, 2007 6:38 AM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
For every step MS takes to living in a nice standards-based world they do crap like this with WLW or the fugly output from MOSS pages. Grrrrr.
Requesting Gravatar... Haacked Jul 30, 2007 7:57 AM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
@Joe - Ha! My module worked too well as it removed atomicselection="true" from my post. I need to give it some more smarts.
Requesting Gravatar... Jacob Jul 30, 2007 9:57 AM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
Very nice. I always end up manually editing "--" to be &mdash; (which I much prefer). Since it's possible I might be the only one with that preference, your regular expression option would be a nice feature.
Requesting Gravatar... Adam Kinney Jul 30, 2007 11:42 AM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
Funny I just wrote some filtering code as well. Mine was focused around the img tag not being closed and ensuring there is an alt attribute even its empty. Good idea using a module though, seems appropiate for the situation.
Requesting Gravatar... Haacked Jul 30, 2007 1:01 PM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
Adam, want to share your code with me so it gets in there? I didn't do the img closing because Subtext uses SGML to automatically convert HTML to XHTML as best as it can. Unfortunately, it can't handle everything you throw at it, such as unknown attributes.
Requesting Gravatar... Josh Twist Aug 03, 2007 5:12 AM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
On the subject of modules and XHTML, have you seen this?

http://www.thejoyofcode.com/Validator_Module.aspx
Requesting Gravatar... Haacked Aug 03, 2007 7:57 AM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
@Josh Yeah, I've used it at work.
Requesting Gravatar... Steve Trefethen Aug 03, 2007 7:51 PM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
Sweet! Thanks much. I too am so tired of fixing WLW markup.
Requesting Gravatar... Steve Trefethen's Weblog Aug 13, 2007 11:57 PM
# Using Windows Live Writer on an ASP.NET blog? Check out Phil Haack's HTMLScrubbe
Requesting Gravatar... Nguyen Truong Tho Apr 26, 2008 8:08 AM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
Haha, that's nice! From your information, I created a plugin that will help you remove that dirty code right in WLW.
You can download it here http://code.google.com/p/removecrapplugin/downloads/list
Requesting Gravatar... Dan Maharry Jun 26, 2008 12:16 AM
# re: Cleanup The Crap That Windows Live Writer Injects With This HttpModule
Phil, Do you know if this HTML cleanup is being included in the next version of Live Writer? I guess now you're a blue badge you might have more of an insight on that one.

What do you have to say?

(will show your gravatar)
Please add 4 and 8 and type the answer here: