Cleanup The Crap That Windows Live Writer Injects With This HttpModule

0 comments suggest edit

First, let me start off with some praise. I really really like Windows Live Writer. I’ve praised it many times on my blog. However, there is one thing that really annoys me about WLW, it’s utter disregard for web standards and the fact that injects crap I don’t want or need into my content.

Of particular annoyance is the way that WLW adds attributes that are not XHTML compliant. For example, when you use the Insert Tags feature, it creates a div that looks something like:

<div class="wlWriterEditableSmartContent" 
  id="guid1:guid2" 
  contenteditable="false" 
  style="padding-right: 0px; display: inline; padding-left: 0px; 
  padding-bottom: 0px; margin: 0px; padding-top: 0px">

What’s the problem? Let me explain. 

  1. First of all, the ID is a GUID that starts with a number. Unfortunately XHTML doesn’t allow the id of an element to start with a number.
  2. The contenteditable attribute is not recognized in XHTML.
  3. The style tag is superfluous and unnecessary. At the very least, it should have been reduced to style=”padding:0; display: inline;”

The purpose of the special class and the contenteditable attribute is to inform WLW that the html tag is editable. In the Web Layout view (F11), you can see a hashed box around the tags like so.

image

Clicking on the box changes the right menu to let you enter tags.

image

Because I actually care about web standards and being XHTML compliant and I’m totally anal, I’ve always gone in and manually changed the HTML after the fact.

Today, out of pure laziness and getting fed up with this extra work I have to do, I decided to write an HttpModule to do this repetitive task for me via a Request Filter. A Request Filter modifies the incoming request.

But to make things interesting, I made sure that the HttpModule makes the changes in an intelligent manner so that no information is lost. Rather than simply removing the cruft, I moved the cruft into the class attribute. Thus the HTML I showed above would be transformed into this:

<div class="wlWriterEditableSmartContent id-guid1:guid2 
  contenteditable-false">

Notice that I simply removed the style tag because I don’t need it.

I also created a Response Filter to modify the outgoing response when the client is Windows Live Writer. That allows the module to convert the above html back into the format that WLW expects. In that manner, I don’t break any WLW functionality.

Other Cool Cleanups

Since I was already writing this module, I decided to make it clean up a few other annoyances.

  • Replaces a single   between two words with a space. So this is cool gets converted to this is cool.
  • Replaces <p> </p> with an empty string.
  • Replaces an apostophre within a word with a typoghraphical single quote. So you can’t say that becomes you can’t say that.
  • Replaces atomicselection=”true”with an empty string. I don’t re-insert this attribute back into the content yet, as I’m not sure if it is even necessary.

Try it out!

This module should work with any ASP.NET blog engine that uses the MetaWeblog API. It only responds to requests made by Windows Live Writer, so it shouldn’t interfere with anything else you may use to post to your blog.

To use it is as easy as dropping the assembly in the bin directory and modifying your web.config to add the following to the httpModules section:

<httpModules>
  <add type="HtmlScrubber.WLWCleanupModule, HtmlScrubber" 
    name="HtmlScrubber" />
</httpModules>

I’m also including the source code and unit tests, so feel free to give it a try. Please understand that this is something I hacked together in a day, so it may be a bit rough around the edges and I give no warranty. Having saidthat, I’m pretty confident it won’t screw up your HTML.

I have plans to add other features and cleanups in the future. For example, it wouldn’t be hard to add a configuration section that allows one to specify other regular expressions and replacement patterns to apply.

If you have any “cleanups” I should include, please let me know. If you’re reading this post, then you know the module worked.

[Download Binaries] [Download Source]

I thought about adding this to CodePlex, but I’m hoping that the next version of Windows Live Writer makes this module irrelevant. I’m not holding my breath on that one though.

Technorati tags: WLW, Windows Live Writer, ASP.NET

Found a typo or error? Suggest an edit! If accepted, your contribution is listed automatically here.

Comments

avatar

17 responses

  1. Avatar for Haacked
    Haacked July 29th, 2007

    The fact that this page validates is a good sign the module is working. :)

  2. Avatar for Joe Cheng [MSFT]
    Joe Cheng [MSFT] July 29th, 2007
    Replaces with an empty string. I don’t re-insert this attribute back into the content yet, as I’m not sure if it is even necessary.


    Replaces what with an empty string?


    We've definitely heard the complaints about generating non-validating markup, and we're hoping to clean up a lot of that stuff in a future release.


    Always cool to see a programmer taking matters into his own hands though!

  3. Avatar for Jim Holmes
    Jim Holmes July 29th, 2007

    For every step MS takes to living in a nice standards-based world they do crap like this with WLW or the fugly output from MOSS pages. Grrrrr.

  4. Avatar for Haacked
    Haacked July 29th, 2007

    @Joe - Ha! My module worked too well as it removed atomicselection="true" from my post. I need to give it some more smarts.

  5. Avatar for Jacob
    Jacob July 29th, 2007

    Very nice. I always end up manually editing "--" to be &mdash; (which I much prefer). Since it's possible I might be the only one with that preference, your regular expression option would be a nice feature.

  6. Avatar for Adam Kinney
    Adam Kinney July 30th, 2007

    Funny I just wrote some filtering code as well. Mine was focused around the img tag not being closed and ensuring there is an alt attribute even its empty. Good idea using a module though, seems appropiate for the situation.

  7. Avatar for Haacked
    Haacked July 30th, 2007

    Adam, want to share your code with me so it gets in there? I didn't do the img closing because Subtext uses SGML to automatically convert HTML to XHTML as best as it can. Unfortunately, it can't handle everything you throw at it, such as unknown attributes.

  8. Avatar for Josh Twist
    Josh Twist August 2nd, 2007

    On the subject of modules and XHTML, have you seen this?
    http://www.thejoyofcode.com...

  9. Avatar for Haacked
    Haacked August 2nd, 2007

    @Josh Yeah, I've used it at work.

  10. Avatar for Steve Trefethen
    Steve Trefethen August 3rd, 2007

    Sweet! Thanks much. I too am so tired of fixing WLW markup.

  11. Avatar for Nguyen Truong Tho
    Nguyen Truong Tho April 25th, 2008

    Haha, that's nice! From your information, I created a plugin that will help you remove that dirty code right in WLW.
    You can download it here http://code.google.com/p/re...

  12. Avatar for Dan Maharry
    Dan Maharry June 25th, 2008

    Phil, Do you know if this HTML cleanup is being included in the next version of Live Writer? I guess now you're a blue badge you might have more of an insight on that one.

  13. Avatar for Dave Schinkel
    Dave Schinkel February 17th, 2009

    Live Writer is also jacking up my blog. Fonts show up different in IE vs. FireFox and all sorts of s***. Spacking is wacked, you name it in my subtext blog because of this pile of a writer.

  14. Avatar for shailesh
    shailesh January 1st, 2010

    Hi,
    Its very nice blog but Still the problem persist.I did add that dll into my website where i am using Metaweblog api and also i did modify web.config file.
    But it could not solved it.In plug in creation I am using smart contetnt class and simply I wnt to upload image using this control ..
    Even I could not unzipped the uploaded code so please correct it ....
    Thanks

  15. Avatar for Tristan
    Tristan December 4th, 2011

    Hi Mr. Haack,
    The links to the zips are giving 404s. I'd love to clean up the crap.
    Ta.

  16. Avatar for Lulu
    Lulu June 23rd, 2013

    Hello, I would like it to remove the

    closing tag altogether. it breaks my jquery plugin completely. No find-replace plugin seem to replace the

    for just an empty space, for example. I want

    GONE! :)

    I will keep an eye in here, If you do it I will be willing to donate a bit for your plugin. :D

  17. Avatar for Lulu
    Lulu June 23rd, 2013

    PS: the tag I want closed is the <

    > ...hoping it shows this time. if not, it is the < / P > closing paragraph tag.