Well this is a bit embarrassing.
But after the migration, despite all my efforts, I faced an onslaught of reports of broken URLs. So what happened?
Well it’s silly. The program I wrote to migrate my posts to Jekyll had a subtle flaw. In order to verify that my URL would be correct, it made a web request to my old blog (which was still up at the time) using the generated file name.
This was how I verified that the Jekyll URL would be correct. The problem is that Subtext had this stupid feature where the date part of the URL didn’t matter so much. It only cared about the slug at the end of the URL.
Thus requests for the following two URLs would receive the same content:
This “feature” masked a timezone bug in my exporter that was causing many posts to generate the wrong date. Unfortunately, my export script had no idea these were bad URLs.
So how’d I fix it? First, I updated my 404 page with information about the problem and where to report the missing file. You can set a 404 page by adding a
404.html file at the root of your Jekyll repository. GitHub pages will serve this file in the case of a 404 error.
If you haven’t set up Google Webmaster Tools for your website, you really should. There are some great tools in there including the ability to export a CSV file containing 404 errors.
So I did that and wrote a new program, Jekyll URL Fixer, to examine the 404s and look for the corresponding Jekyll post files. I then renamed the affected files and updated the YAML front matter with the correct date.
Hopefully this fixes most of my bad URLs. Of course, if anyone linked to the broken URL in the interim, they’re kind of hosed in that regard.
I apologize for the inconvenience if you couldn’t find the content you were looking for and am happy to refund anyone’s subscription fees to Haacked.com (up to a maximum of $0.00 per person).