Modular Backup Strategy

Over the past several months I’ve been spending some of my free time rethinking my digital infrastructure. It’s quite amazing to see how complex some of this can become (and by extension how complex some of us geeks can make it.) When I looked at the number of different computing devices I had in my life, it started to become clear that the strategies I was using to manage the devices and the data on those devices was no longer adequate.

In addition to the physical hardware, there are a number of Virtual Machines built in VirtualBox which were performing various and sundry tasks. Part of the project I undertook was a complete rebuild of the home server infrastructure, but what I’m going to focus on with this post is the backup plan.

The Premise

I wanted to make sure that any backups I handle are being done automatically and at regular intervals. I don’t want to have to think about them, and I don’t want to have to perform any manual work to get the backups to run properly. I also want to make sure that I have at least three copies of the most important data, on at least two different types of media. Ideally I also want to have an off-site copy of my data in case something happens to my home and all of my hardware is lost or destroyed.

Step 1 – Local Network Backup

The first and most logical step for me was to set up a very short rsync script to push all of my important data from my primary machine to a designated backup folder on the home server. The rsync script was scheduled with a cron job to run every two hours and mirror the data on my local machine to the server ensuring that I had a second copy of my data. While this does not provide me with a local and ready archive of past versions, that is a sacrifice I’m willing to make. There’s also a later step in the plan which helps alleviate some of the burden.

Step 2 – Disaster Recovery

The second step was to establish a second backup of the data outside of my home office. Both the laptop and the home server are within a couple feet of each other so it’s not inconceivable for some kind of disaster to wipe out both of those devices. For this task I’ve chosen to return to CrashPlan. Notice I call this my “disaster recovery” copy. Theres a couple of reasons for that:

Minor Disaster If I have a minor issue like file corruption or a deleted file that I want back I can turn to CrashPlan. The CrashPlan backups run continuously on the home server and push data up to the CrashPlan cloud. CrashPlan in turn keeps revisions of changed and deleted files so that in the event a file is lost or corrupted I can download it from CrashPlan.
Major Disaster In the event that all of my local copies are lost or destroyed I can retrieve all of my data from the CrashPlan cloud and restore my digital life. This is really what I mean when I call this my disaster recovery copy as the offsite cloud contains a full copy of all of my data and allows me to get everything back.

Step 3 – Belt and Suspenders

This last step provides me with a wee bit more protection for my most critical data. As much as I love being able to get access to the CrashPlan data, downloading nearly 2TB of files from the Internet is a fair bit of strain on a network connection and also takes a fair bit of time. For that reason I’ve chosen to keep a third backup of my most critical data (at the time of this writing, about 400GB) at my parents’ home. I have set up a Raspberry Pi running Raspbian and Samba that will allow me to keep a copy of this data synchronized to a secondary location that is still off-site, but is only a 20 minute drive away. In case of an emergency I can simply retrieve the backup drive and bring it home to restore my data. This synchronization is being done from the home server directly to the offsite location so that my laptop doesn’t have to be online. This is important because the off-site synchronization is about 100x slower than the local in-house connection.