Fixing Broken Jekyll URLs

jekyll 8 comments suggest edit

Well this is a bit embarrassing.

I recently migrated my blog to Jekyll and subsequently wrote about my painstaking work to preserve my URLs.

But after the migration, despite all my efforts, I faced an onslaught of reports of broken URLs. So what happened?

Broken glass by Tiago Pádua CC-BY-2.0

Well it’s silly. The program I wrote to migrate my posts to Jekyll had a subtle flaw. In order to verify that my URL would be correct, it made a web request to my old blog (which was still up at the time) using the generated file name.

This was how I verified that the Jekyll URL would be correct. The problem is that Subtext had this stupid feature where the date part of the URL didn’t matter so much. It only cared about the slug at the end of the URL.

Thus requests for the following two URLs would receive the same content:

  • https://haacked.com/archive/0001/01/01/some-post.aspx
  • https://haacked.com/archive/2013/11/21/some-post.aspx

Picard Face Palm

This “feature” masked a timezone bug in my exporter that was causing many posts to generate the wrong date. Unfortunately, my export script had no idea these were bad URLs.

Fixing it!

So how’d I fix it? First, I updated my 404 page with information about the problem and where to report the missing file. You can set a 404 page by adding a 404.html file at the root of your Jekyll repository. GitHub pages will serve this file in the case of a 404 error.

I then panicked and started fixing errors by hand until my helpful colleagues Ben Balter and Joel Glovier reminded me to try Google Analytics and Google Webmaster Tools.

If you haven’t set up Google Webmaster Tools for your website, you really should. There are some great tools in there including the ability to export a CSV file containing 404 errors.

So I did that and wrote a new program, Jekyll URL Fixer, to examine the 404s and look for the corresponding Jekyll post files. I then renamed the affected files and updated the YAML front matter with the correct date.

Hopefully this fixes most of my bad URLs. Of course, if anyone linked to the broken URL in the interim, they’re kind of hosed in that regard.

I apologize for the inconvenience if you couldn’t find the content you were looking for and am happy to refund anyone’s subscription fees to Haacked.com (up to a maximum of $0.00 per person).

Found a typo or error? Suggest an edit! If accepted, your contribution is listed automatically here.

Comments

avatar

8 responses

  1. Avatar for James
    James December 12th, 2013

    i'm just here for the refund... i heard there were refunds.

  2. Avatar for James C
    James C December 13th, 2013

    Keep juggling those nuts while waiting... ;)

  3. Avatar for Matthieu Penant
    Matthieu Penant December 13th, 2013
  4. Avatar for haacked
    haacked December 13th, 2013

    Yeah, I believe the second URL you posted is the correct URL. You can find it here: http://forums.asp.net/t/157...

    Where did you find the first URL?

  5. Avatar for Matthieu Penant
    Matthieu Penant December 14th, 2013

    on SO : http://stackoverflow.com/qu... . But I fixed it to the new/correct one. Maybe it was a typo... Anyway it's fixed :)

  6. Avatar for Alistair Lattimore
    Alistair Lattimore December 16th, 2013

    Phil,

    There is a really easy to way do this with Google Analytics:

    1) Drill into Behaviour->Site Content->All Pages
    2) Switch to table's Primary Dimension to Page Title (above the table)
    3) Filter the table rows by searching for '404' (which is in your 404 error handler title tag)
    4) Drill into the 404 error filtered result
    5) Celebrate as you've just found every URL on your site generating a 404 error

    Al.

  7. Avatar for Rabo
    Rabo December 17th, 2013

    I want double the maximum refund or no deal!

  8. Avatar for Eric Z
    Eric Z December 20th, 2013

    There’s another kind of url related thing you can fix. All your rss feed items show up with .aspx at the end of the title.