Backing up the CMS cloud
The CMS cloud is starting to store useful data, so I’ve been looking at ways to set up backups. I’ve been using Amanda on my workstation for a couple years now, backing up to an external hard drive, and I wanted to set up something similar.
The goal of the backup system is to get essential data files copied out of the
cloud systems and onto persistent storage, in this case NCI’s global Lustre
filesystem /g/data
which is NFS mounted on a couple of the cloud’s VMs. It’s
also essential that these files are secure, as they might contain private
information, and that in the event of the cloud going down we can actually
recover from them.
There’s an Amanda Puppet class available, pdxcat/amanda which I used as the basis for the setup. This has separate classes for the client and server, which I put into local classes following my standard layout:
The Puppet class doesn’t manage Amanda’s main config file, instead allowing you
to point to a directory on the Puppet fileserver for you to use, although it
does create the disklist
file containing the paths to be backed up.
Amanda’s config files are a bit obscure, since it’s focused on tape based solutions. You can back up to disk, using ‘virtual tape changers’. I found the instructions on the wiki to be fairly understandable, and created a Puppet template to build this file.
I decided to manually create the virtual tape drives, rather than doing this
through Puppet with its annoying lack of loops. Running the following as
the amandabackup
user will create the tapes:
Since the backups handle secure data I needed to make sure they were stored in an encrypted fashion. Amanda makes this pretty simple, though you need to create the encryption keys. I created a Bash script to do this, which is run by Puppet if the keys aren’t present.
With the keys present you need to create a new dumptype
in the Amanda config file:
The pdxcat/amanda
class handles collecting the backup paths for you via exported
resources. I created a helper type to add sensible defaults and make sure the
encrypted backups were used:
I can then use client::backup::directory{}
in my classes to easily backup a
path.
The Amanda server connects to clients using SSH. I set up a little custom
type
to automate this. Facter allows you to create custom facts by placing files in
the directory /etc/facter/facts.d
. I used this to store the server’s public
SSH key as a custom fact, clients then read this and add to their authorized
keys.
I spent a bit of time trying to get Amanda to run as a LDAP user, so that it would write to our NFS storage directly, but in the end I decided on a mirror approach - Amanda would perform the backups as a local user, then the LDAP user mirrored the tapes.
Since Amanda is intended to be run as a cron job, I created another Bash script that performed the backup & the mirror in one step, so that I didn’t need to worry about the commands running at the same time. Creating the cron job using my custom cron type was all that I needed for monitoring, as Amanda doesn’t run any background services.
With the backups running data can be recovered by administrators using the
amrecover
tool. If
the backup server itself goes down the data can still be recovered from the NFS
drives and a copy of the encryption keys, which I store a copy of offsite.