Back in Business

personal 0 comments suggest edit

Yeah, the past few days have been a pretty low moment for me and this blog. Long story short, on December 11, a hard-drive failure took down the managed dedicated server which hosts my blog among other sites.

(The following image is a dramatization of actual events and is not the actual hard drive)

Crufty
Hard-DriveThis is a server that Jeff Atwood and I share (we each host a Virtual Server on the machine), thus all of the following sites were brought down by the hardware malfunction:

That list doesn’t include my personal Subversion server (yes, I’m planning to switch to GitHub for that).

The good news is that my hosting provider, CrystalTech, was taking regular backups of the machine. The bad news is that all of these sites were hosted in virtual machines. The Virtual Hard Drive files (usually referred to as VHD files) which contain the actual data for our virtual machines were always in use and were not being backed up, silently failing each time.

Properly backing up a live virtual server requires taking advantage of Volume Shadow Copy Service (VSS) as described in this blog post to backup live virtual server VMs, but this was not in place, probably due to a lack of coordination between us and the hosting provider.

Recovery

A data recovery company was brought in to try and recover the data. They replaced the drive head assembly and took a forensic image from the drive and started trying to recover our data. So far the actual VHD files we need have not yet been recovered. However, they were able to recover an older VHD I had backed up in 2007. That allowed me to grab all my content files such as images and code samples from back in 2007.

Luckily, I had recently backed up my database locally a few months ago. Not only that, thanks to the helpful Rich Skrenta, I was able to have a static web archive of my blog up and running quickly. He had a cache of both my and Jeff’s (http://codinghorror.com) blog with the directory structure intact! That allowed me to retain my permalinks and have my content in a readonly state. I can only assume this cache is related to his search engine startup, http://blekko.com/.

From there, I started using grepWin against a copy of those static HTML files to strip out the relevant information and convert the blog posts I didn’t have in my database into one big T-SQL script which would insert all the blog posts and comments back into my database.

I had to upgrade my blog to an unreleased version of Subtext because I was in the process of testing the latest version against the copy of my database. That’s why I copied it locally in the first place, so there might be potential wonkiness if I made any mistakes in the upgrade.

At this point, most of the content for my blog is back up. I’m missing some comments left on the most recent post and many of the images on posts after 2007. Unfortunately getting cached images en masse is a pretty big challenge.

I’m also missing some code samples etc, but I can start posting those back up there when I have time.

Lessons Learned

In general, I’m not a fan of the blame game as blame can’t change the past. It sucks, but what’s done is done. I’ll certainly let my hosting provider know what they can do better, but I also share in some of the blame for letting this happen.

What’s more interesting to me is learning from the past to help realize a better future, since that is something I can affect. What lessons did I learn (and re-learned because the lesson didn’t make it through my thick skull the first time) from this?

First and foremost as many mentioned to me on Twitter (thanks!):

An untested backup strategy is no backup strategy at all! Test your backups!

I think a corollary to that is to try and have a backup strategy that’s easy to setup. I actually had a process for backing up my database and content regularly, but when I moved to the new hosting provider, I forgot to set it up again.

I think the other lesson is that even if you have managed hosting, you should have your own local backups of the important content in your site.

Backup Strategy

I’m setting up a much better back-up strategy which will include automatic backup verifications by setting up my site on a local machine so I can browse the backup locally. When I get it in place, I’ll write a follow-up post and hope to get good suggestions on how to improve it.

UPDATE:Looks like I am having an issue with comments not showing up and over-aggressive spam controls. This is the result of dogfooding the latest trunk build of my software. ;) Glad to find these issues now before releasing the latest version. :)

UPDATE: 12/14/2009Jeff Atwood declares today to be International Backup Awareness Day and gives his perspective on the server failure that affected us both and how he sucks. Yes, I must share in that suck too.

UPDATE 12/14/2009 10:19 PMI was able to recover most of my images through a lucky break. I wrote about how the IIS SEO Toolkit saves the day.

Technorati Tags: hosting,backups,crystaltech,blog

Found a typo or error? Suggest an edit! If accepted, your contribution is listed automatically here.

Comments

avatar

22 responses

  1. Avatar for Lb
    Lb December 13th, 2009

    Welcome back Phil. Hope you haven't aged by too many years over the last few days.

  2. Avatar for Dario Solera
    Dario Solera December 13th, 2009

    I'm a bit surprised to find that two great software professionals have been bit by such a simple task like server backup. :)
    I never trusted backups done by hosting service providers, that's why I always have my own backup strategy in place: every day a batch file backs up all the needed files (including database BAKs, generated daily too) and makes a ZIP file, which is then downloaded via FTP from another machine (running at the opposite side of the Atlantic Ocean). Of course this doesn't work with locked files such as VHD, but I have none so it's fine for me. Also, the ZIP files is relatively small (~450MB), so it easy to do a remote backup.

  3. Avatar for Miha Markič
    Miha Markič December 13th, 2009

    Amazing. No proper backups and no RAID array for important stuff. Sounds like a hardwarehorror story to me :-).
    Seriously, simple RAID 1 SATA controllers are integrated on motherboards and all it costs you to buy two (instead of one) disks. Which are really inexpensive these days.
    Hope you recover the important data...at least source code for those advanced asp.net mvc 2 features we need ;-D

  4. Avatar for Anthony Bouch
    Anthony Bouch December 13th, 2009

    Thanks for sharing a valuable lesson with the rest of us Phil.
    When things go wrong it's not uncommon for people and pros alike to try and pave over the facts about what really happened (especially if there's a liability or cost issue).
    Your post above is a great reminder that it's not a question of 'if things will go wrong' - but when - and whether you'll be ready to deal with it. You've also prompted me to make some changes where my own backup and recovery procedures are concerned - so thanks again, and good luck in getting the rest of your content back on the site.

  5. Avatar for Mike
    Mike December 13th, 2009

    Someone (I don't remember who) said "We don't have backup plans. We have recovery plans."

  6. Avatar for Michal Talaga
    Michal Talaga December 13th, 2009

    Having 1 failing hard drive taking down the server is simply LAME!
    BAAAAD business for hosting providers. With drives so cheap anyone can afford some kind of RAID. And this has nothing to do with backups. Just a total failure on the provider part.
    Ofc backups are good, but I STRONGLY prefer not to fail in the first place and have backups for a different reason which is historical data in case I make a fuckup overwriting the last copy of my uber important files.

  7. Avatar for TheLudditeDeveloper
    TheLudditeDeveloper December 13th, 2009

    According to Jeff's post, here:
    www.codinghorror.com/blog/archives/000984.html
    you were supposed to have 300GB raid 5 array, so did the whole raid array fail? Did you really have a raid 5 array? Did more than one drive fail?
    Were you still using Windows 2003 or had you upgraded to Windows 2008 and Hyper-V virtualisation? Would such an upgrade have made backing up your VPS any easier?

  8. Avatar for Steve Smith
    Steve Smith December 13th, 2009

    Another good tip to remember:
    All hard drives eventually fail.
    Many people look at disk failures like winning the lottery - it might happen but it's so rare you should plan around it. The reality is, it *will* happen, you just don't know if it will happen today.
    Hope you get everything fully recovered!

  9. Avatar for haacked
    haacked December 13th, 2009

    Looks like there's a bug in the build of Subtext I happen to be using. This is an unreleased version so I'm testing in production.

  10. Avatar for Damien Guard
    Damien Guard December 14th, 2009

    They should be backing up the VM's as if they were real machines - i.e. connecting to their filesystems and backing up the contents - not attempting to backup the VHD's on the host.
    [)amien

  11. Avatar for haacked
    haacked December 14th, 2009

    Comments should be fixed now! :)

  12. Avatar for NC
    NC December 14th, 2009

    Dario Solera, what do you mean two great software professionals? Theres only one, Phill, cos Jeff is a tool.

  13. Avatar for Dhananjay Goyani
    Dhananjay Goyani December 14th, 2009

    Rightly said by Joel today - www.joelonsoftware.com/items/2009/12/14.html

  14. Avatar for KevDog
    KevDog December 15th, 2009

    Every profession has its share of practices that it insists that their customers use that they themselves neglect. With lawyers, they forget to make out a will, doctors smoke, etc. It seems clear that our Achilles heel is backup and restore plans.

  15. Avatar for Jeremy
    Jeremy December 16th, 2009

    "The cobbler's children have no shoes."
    Off to check that my RAID array is as it should be.

  16. Avatar for Steve Moss
    Steve Moss December 20th, 2009

    You can retrieve some of your posts from the Wayback machine at http://web.archive.org. This site has snapshots of your website from 4th August 2008 back to 9th June 2004.

  17. Avatar for San
    San December 22nd, 2009

    I am desperatly struggling to setup subtext code I downloaded from http://code.google.com/p/subtext/ with out any installation guide as http://subtextproject.com is still down. I am not able to locate any subtext installation guide over internet, could you please help me out?

  18. Avatar for San
    San December 24th, 2009

    Any possibilities of recovering/ getting a copy of the subtext installation guide and documentation?

  19. Avatar for Anonymous
    Anonymous January 2nd, 2010

    It is amazing that people do not read. This was a backup issue with your web host not your own hardware. Geesh!

  20. Avatar for rakeback
    rakeback January 30th, 2010

    Sorry to hear about your hard drive. I have had this problem too in the past. My hard drive had bad sectors and would not load into windows and I lost all of my data. I learned my lesson and always back everything up on an external hard drive.

  21. Avatar for EllenOr
    EllenOr January 18th, 2013

    Glad to hear your recovering. Loosing data on a local hard drive alone is enough stress as is. 

  22. Avatar for tom
    tom March 8th, 2015

    This is good news, We are currently building a site on ASP.NET MVC 1, and the development for this will probably go on for another two months. I am thinking we should use ASP.NET MVC 2 since it's already in Preview 2. Phil, What would you advice? Should we wait for the final RTM? Does Preview 2 have a go live license?