Back in Business

Yeah, the past few days have been a pretty low moment for me and this blog. Long story short, on December 11, a hard-drive failure took down the managed dedicated server which hosts my blog among other sites.

(The following image is a dramatization of actual events and is not the actual hard drive)

Crufty Hard-DriveThis is a server that Jeff Atwood and I share (we each host a Virtual Server on the machine), thus all of the following sites were brought down by the hardware malfunction:

That list doesn’t include my personal Subversion server (yes, I’m planning to switch to GitHub for that).

The good news is that my hosting provider, CrystalTech, was taking regular backups of the machine. The bad news is that all of these sites were hosted in virtual machines. The Virtual Hard Drive files (usually referred to as VHD files) which contain the actual data for our virtual machines were always in use and were not being backed up, silently failing each time.

Properly backing up a live virtual server requires taking advantage of Volume Shadow Copy Service (VSS) as described in this blog post to backup live virtual server VMs, but this was not in place, probably due to a lack of coordination between us and the hosting provider.

Recovery

A data recovery company was brought in to try and recover the data. They replaced the drive head assembly and took a forensic image from the drive and started trying to recover our data. So far the actual VHD files we need have not yet been recovered. However, they were able to recover an older VHD I had backed up in 2007. That allowed me to grab all my content files such as images and code samples from back in 2007.

Luckily, I had recently backed up my database locally a few months ago. Not only that, thanks to the helpful Rich Skrenta, I was able to have a static web archive of my blog up and running quickly. He had a cache of both my and Jeff’s (http://codinghorror.com) blog with the directory structure intact! That allowed me to retain my permalinks and have my content in a readonly state. I can only assume this cache is related to his search engine startup, http://blekko.com/.

From there, I started using grepWin against a copy of those static HTML files to strip out the relevant information and convert the blog posts I didn’t have in my database into one big T-SQL script which would insert all the blog posts and comments back into my database.

I had to upgrade my blog to an unreleased version of Subtext because I was in the process of testing the latest version against the copy of my database. That’s why I copied it locally in the first place, so there might be potential wonkiness if I made any mistakes in the upgrade.

At this point, most of the content for my blog is back up. I’m missing some comments left on the most recent post and many of the images on posts after 2007. Unfortunately getting cached images en masse is a pretty big challenge.

I’m also missing some code samples etc, but I can start posting those back up there when I have time.

Lessons Learned

In general, I’m not a fan of the blame game as blame can’t change the past. It sucks, but what’s done is done. I’ll certainly let my hosting provider know what they can do better, but I also share in some of the blame for letting this happen.

What’s more interesting to me is learning from the past to help realize a better future, since that is something I can affect. What lessons did I learn (and re-learned because the lesson didn’t make it through my thick skull the first time) from this?

First and foremost as many mentioned to me on Twitter (thanks!):

An untested backup strategy is no backup strategy at all! Test your backups!

I think a corollary to that is to try and have a backup strategy that’s easy to setup. I actually had a process for backing up my database and content regularly, but when I moved to the new hosting provider, I forgot to set it up again.

I think the other lesson is that even if you have managed hosting, you should have your own local backups of the important content in your site.

Backup Strategy

I’m setting up a much better back-up strategy which will include automatic backup verifications by setting up my site on a local machine so I can browse the backup locally. When I get it in place, I’ll write a follow-up post and hope to get good suggestions on how to improve it.

UPDATE:Looks like I am having an issue with comments not showing up and over-aggressive spam controls. This is the result of dogfooding the latest trunk build of my software. ;) Glad to find these issues now before releasing the latest version. :)

UPDATE: 12/14/2009Jeff Atwood declares today to be International Backup Awareness Day and gives his perspective on the server failure that affected us both and how he sucks. Yes, I must share in that suck too.

UPDATE 12/14/2009 10:19 PMI was able to recover most of my images through a lucky break. I wrote about how the IIS SEO Toolkit saves the day.

Technorati Tags: ,,,

What others have said

Requesting Gravatar... Lb Dec 13, 2009 5:44 PM
# re: Back in Business
Welcome back Phil. Hope you haven't aged by too many years over the last few days.
Requesting Gravatar... Dario Solera Dec 13, 2009 6:03 PM
# re: Back in Business
I'm a bit surprised to find that two great software professionals have been bit by such a simple task like server backup. :)

I never trusted backups done by hosting service providers, that's why I always have my own backup strategy in place: every day a batch file backs up all the needed files (including database BAKs, generated daily too) and makes a ZIP file, which is then downloaded via FTP from another machine (running at the opposite side of the Atlantic Ocean). Of course this doesn't work with locked files such as VHD, but I have none so it's fine for me. Also, the ZIP files is relatively small (~450MB), so it easy to do a remote backup.
Requesting Gravatar... Miha Markič Dec 13, 2009 6:59 PM
# re: Back in Business
Amazing. No proper backups and no RAID array for important stuff. Sounds like a hardwarehorror story to me :-).
Seriously, simple RAID 1 SATA controllers are integrated on motherboards and all it costs you to buy two (instead of one) disks. Which are really inexpensive these days.
Hope you recover the important data...at least source code for those advanced asp.net mvc 2 features we need ;-D
Requesting Gravatar... Anthony Bouch Dec 13, 2009 8:48 PM
# re: Back in Business
Thanks for sharing a valuable lesson with the rest of us Phil.

When things go wrong it's not uncommon for people and pros alike to try and pave over the facts about what really happened (especially if there's a liability or cost issue).

Your post above is a great reminder that it's not a question of 'if things will go wrong' - but when - and whether you'll be ready to deal with it. You've also prompted me to make some changes where my own backup and recovery procedures are concerned - so thanks again, and good luck in getting the rest of your content back on the site.
Requesting Gravatar... Mike Dec 13, 2009 10:04 PM
# re: Back in Business
Someone (I don't remember who) said "We don't have backup plans. We have recovery plans."
Requesting Gravatar... Michal Talaga Dec 13, 2009 10:29 PM
# re: Back in Business
Having 1 failing hard drive taking down the server is simply LAME!

BAAAAD business for hosting providers. With drives so cheap anyone can afford some kind of RAID. And this has nothing to do with backups. Just a total failure on the provider part.

Ofc backups are good, but I STRONGLY prefer not to fail in the first place and have backups for a different reason which is historical data in case I make a fuckup overwriting the last copy of my uber important files.
Requesting Gravatar... TheLudditeDeveloper Dec 13, 2009 11:18 PM
# re: Back in Business
According to Jeff's post, here:
www.codinghorror.com/blog/archives/000984.html

you were supposed to have 300GB raid 5 array, so did the whole raid array fail? Did you really have a raid 5 array? Did more than one drive fail?

Were you still using Windows 2003 or had you upgraded to Windows 2008 and Hyper-V virtualisation? Would such an upgrade have made backing up your VPS any easier?
Requesting Gravatar... Steve Smith Dec 13, 2009 11:55 PM
# re: Back in Business
Another good tip to remember:

All hard drives eventually fail.

Many people look at disk failures like winning the lottery - it might happen but it's so rare you should plan around it. The reality is, it *will* happen, you just don't know if it will happen today.

Hope you get everything fully recovered!
Requesting Gravatar... haacked Dec 14, 2009 2:18 AM
# re: Back in Business
Looks like there's a bug in the build of Subtext I happen to be using. This is an unreleased version so I'm testing in production.
Requesting Gravatar... Damien Guard Dec 14, 2009 4:16 AM
# re: Back in Business
They should be backing up the VM's as if they were real machines - i.e. connecting to their filesystems and backing up the contents - not attempting to backup the VHD's on the host.

[)amien
Requesting Gravatar... haacked Dec 14, 2009 7:28 AM
# re: Back in Business
Comments should be fixed now! :)
Requesting Gravatar... NC Dec 14, 2009 8:33 AM
# re: Back in Business
Dario Solera, what do you mean two great software professionals? Theres only one, Phill, cos Jeff is a tool.
Requesting Gravatar... Dhananjay Goyani Dec 14, 2009 2:24 PM
# re: Back in Business
Rightly said by Joel today - www.joelonsoftware.com/items/2009/12/14.html
Requesting Gravatar... KevDog Dec 15, 2009 9:01 PM
# re: Back in Business
Every profession has its share of practices that it insists that their customers use that they themselves neglect. With lawyers, they forget to make out a will, doctors smoke, etc. It seems clear that our Achilles heel is backup and restore plans.
Requesting Gravatar... Jeremy Dec 16, 2009 1:46 PM
# re: Back in Business
"The cobbler's children have no shoes."

Off to check that my RAID array is as it should be.
Requesting Gravatar... Steve Moss Dec 20, 2009 5:25 PM
# re: Back in Business
You can retrieve some of your posts from the Wayback machine at http://web.archive.org. This site has snapshots of your website from 4th August 2008 back to 9th June 2004.
Requesting Gravatar... San Dec 22, 2009 3:18 PM
# re: Back in Business
I am desperatly struggling to setup subtext code I downloaded from http://code.google.com/p/subtext/ with out any installation guide as http://subtextproject.com is still down. I am not able to locate any subtext installation guide over internet, could you please help me out?
Requesting Gravatar... San Dec 24, 2009 3:50 AM
# re: Back in Business
Any possibilities of recovering/ getting a copy of the subtext installation guide and documentation?
Requesting Gravatar... Anonymous Jan 02, 2010 3:39 AM
# re: Back in Business
It is amazing that people do not read. This was a backup issue with your web host not your own hardware. Geesh!
Requesting Gravatar... rakeback Jan 30, 2010 12:13 PM
# re: Back in Business
Sorry to hear about your hard drive. I have had this problem too in the past. My hard drive had bad sectors and would not load into windows and I lost all of my data. I learned my lesson and always back everything up on an external hard drive.

What do you have to say?

(will show your gravatar)
Please add 5 and 4 and type the answer here: