Windows Live Writer and Html Entities

I’ve been banging my head against a couple of problems with the interaction between Subtext and Windows Live Writer that I thought I’d post on this here blog in the hopes that someone can help.

I expect that Mr. Hanselman might know the answer, but will only tell me after properly extolling DasBlog’s superiority over Subtext first. Very well.

Here’s the first issue. I’m kind of a fan of typography and go through the extra effort to use proper apostrophes and quotes.

For example. Instead of using ’ for a quote, I will use ’. Instead of “quotes”, I will use “real quotes”. It’s just how I roll.

For the apostrophe, I use the HTML entity code ’. For quotes I use the opening quotes “ followed by the closing quotes ”.

However, when you enter these things in WLW and post them to your blog, it converts them to the actual characters. Thus when I query my database, I see “quotes” instead of “quotes” as I would expect.

I wish WLW would not screw around with these conversionsn, but until then, I was thinking about doing a simple conversion on the server back to the original entity encodings.

However, I can’t just call the HttpUtility.HtmlEncode method as that would encode the angle brackets et all. I still want the HTML as HTML, I just want the special characters to remain entity encoded.

Anyone have a clever method for doing this, or will I need to brute force this sucker?

  1. Avatar for Joe Cheng [MSFT]
    Joe Cheng [MSFT] March 23rd, 2007

    I'm not sure why we're doing this, I'll look into it. In the meantime, does this actually cause the quotes to look broken when rendered on your blog? If your blog uses UTF-8 then it seems like it should render fine.
    If your complaint is that we're mucking with HTML you type into HTML Code view, I agree that we shouldn't be doing that.

  2. Avatar for orcmid
    orcmid March 23rd, 2007

    I am having the same problem with BlogJet together with Blogger. (BlogJet does smart quotes, but the problem is for other characters, like &Lambda in "Λ the Ultimate".
    [Now we can also wonder about comment forms too, aye?]
    I don't know where the character entities are being lost, but it can lead to the character not rendering properly when I bring a recent post back into BlogJet to update something.
    They do render on the browser view of the blog page of the original post, but they break in the returned recent post from Blogger, and I have to fix them each time to avoid corrupting the post.

  3. Avatar for Haacked
    Haacked March 23rd, 2007

    thanks Joe. Yes, the mucking with my HTML is the problem.
    The other problem was when I cut and paste the XML-RPC being sent back and forth into a file, named that file with a .xml extension, and then opened it with IE, it gave me an error with those characters.
    So I started worrying the category thing was an encoding issue. But as you posted in my other post, it appears to just be a not-yet implemented "feature" of WLW. :)

  4. Avatar for orcmid
    orcmid March 23rd, 2007

    While fresh in my mind, you've found a wonderful example of system incoherence and its consequences. There's a whole change of potential transformation points in the path between you and the browser view of the blog post (or the comment form and the blog post view), and back again (from some intermediate point) for re-editing of the material.
    It is easy to introduce transformations in the wrong places, and it is easy to repair them in more wrong places. This has happened in the bumpy evolution of e-mail and it is now happening here.

  5. Avatar for orcmid
    orcmid March 23rd, 2007

    Uh, I meant "There's a whole chain of potential transformation points ..." and with that I'm quitting while I'm ahead.

  6. Avatar for Damien Guard
    Damien Guard March 24th, 2007

    Am I right in thinking you're expecting HTML entities/encoding to be stored in your database instead of the UTF8 characters?
    If so, that seems fundamentally wrong to me - mixing up the current presentation markup with the real content. What if you want to serve it up somewhere else in the future that doesn't use HTML/these entities?

  7. Avatar for Haacked
    Haacked March 24th, 2007

    Well Damien, since I'm already storing HTML in the database, what would be the problem with that?
    For example, I'm not storing the image when I write a blog post, I store the <img /> tag.
    As long as my blog posts are XHTML, it shouldn't be a problem as I'm already separating content with presentation. I can always perform a transformation on the content should I need to.

  8. Avatar for opello
    opello March 26th, 2007

    A question I've always considered when writing any form that takes user input (at least when I'm doing it for myself and not something like work), is:
    Should the database store the magical 'entityified' version or a real, medium-independent (well, sort-of), UTF-8 encoded version that could be dumped to a terminal or a web browser without problem.
    I've yet to come to a conclusion either way, but whenever I think about it, I think the database should hold the no-entities version.

