As I’ve talked about recently, as a background task, I’m setting up a new Ubuntu server to take over the main file server, mail server, build server, backup server, web server, and you-name-it-server duties of my home office. It will eventually be taking over from a venerable Debian server, which was built on some old hardware left over from retired machines (except with 2 new mirrored hard drives) and has basically sat in a corner being rock-solid for years without me touching it, at least until the PSU failed (and was rapidly replaced), reminding me that I’ve been meaning to upgrade it for over a year. This machine does more things than I care to remember half the time, so without a huge amount of time to spare it’s taking me a while to bring its younger, upgraded, more power-efficient replacement fully online.
One of the things I wanted to move away from was my manually scripted backups. Under Linux backups are a generally simpler affair than on Windows; just tar & compress the file system, skipping a few folders like /proc and /tmp and you’re pretty much done, so I wrote a few simple scripts and left it at that, which has worked ok for a while, but as I started to want more complex backup strategies (different subsets sent to S3, etc), manually scripting everything was becoming tedious. I’d tried setting up Amanda before but found it a little overcomplicated for my needs (and reverted to scripts), so I decided to try Bacula this time.
In all it probably took me 3 evenings to get it set up the way I wanted, of which 2 were spent reading the Bacula manual and experimenting, and 1 was solely spent trying to get around an issue with mounting a remote drive on demand. I don’t use tape drives, because I think they’re overpriced and an unnecessary hassle for a small office setup; instead I send all my backups directly to a separate (RAIDed) NAS box, and replicate them to another machine for added resilience. I also upload a critical subset (encrypted) to Amazon S3 for emergency off-site recovery purposes; a removable external HD would be another option, but a) I’d have to keep taking it off-site all the time, and b) you’d really need 2 copies, because removable HDs are just not resilient enough unless they’re at least RAID1).
The problem I had was getting the NAS box mounted on-demand for the backups; I do it over CIFS (Samba) - I could use NFS (since the NAS runs embedded Linux and has NFS support) but the nice thing about using CIFS is that I know I can redirect it to any machine in the office at any time if I need to, since everything supports it (including OS X and Windows), and I won’t start hitting new issues because of a protocol switch. Since we can’t rely on the NAS being mounted all the time (it may be taken offline from time to time), Bacula needs to mount it & unmount it on demand - this is what my manual scripts used to do. Bacula seemed to support this via the “Requires Mount=yes” option on the storage device, and being able to define mount and unmount commands, but I could never get this to work, despite trying for most of an evening (it would just complain about the device not being mounted, and seemingly not try to run the specified mount commands). In the end, I used a much better overall approach anyway, by using autofs to auto-mount the NAS box when a specific path is accessed, and unmount it after a period of no use. That solved the problem in Bacula (I could just point it at a path and let autofs handle the rest), and also meant it’s much easier to browse the NAS from this machine generally.
So, now I have it set up, I’m really quite pleased with it. It supports on the fly compression but does it per file (unlike how you would do it with tar, which is on the whole archive) which makes it a little more reassuring that you wouldn’t lose as much in the case of a limited data corruption. The backup schedules and volume recycling configuration are really very powerful, and they have features in there which make doing backups to disk systems rather than tapes very convenient. The visibility is generally a lot better than my manual scripts, restoring is easier, it’s easier to keep on top of the space used with the recycling, and it’s far easier to manage a ‘deeper’ full/incremental policy without making your restore process harder. Plus, I have a queryable log of everything (rather than just cron logs and notification emails) which feeds metadata into the restore process as you’d expect. Good stuff - all the kinds of features you’d expect from a commercial backup solution.
Since I don’t install X on my servers, I had to configure everything with a text editor rather than the GUI Qt tool (‘bat’), but really it wasn’t a big deal. Once you understood the concepts, the text configuration was really quite straightforward (and well commented), as most good Linux server apps are. Of course, I can run the GUI on my Windows or Mac desktops too, and configure the scheduler remotely from there. I haven’t tried that yet.
So far, Bacula seems like a great solution for small office backups.