s3putsecurefolder

Linux, Open Source, Tech, Web 4 Comments

Edit: this script is deprecated in favour of a rewritten version 2.

I use Amazon S3 to host large media files which I want cheap scalable bandwidth on, and for expandable offsite storage of important backups. I used to have some simple incremental tar scripts to do my offsite backups, but since I moved to Bacula, I’ve just established an alternative schedule and file set definition for my offsite backups, the critical subset of data I couldn’t possibly stand to lose (like company documents). Since I was refreshing all my procedures and tarring the Bacula volumes no longer made any sense, I rewrote my script for putting the resulting backup data on S3.

The prerequisite in all cases is s3cmd, which is pretty mature now and available on most distros (“apt-get install s3cmd” and you’re done on Ubuntu). s3cmd actually has a ‘sync’ command, but firstly that tries to sync in both directions, which I don’t want (I know in theory it should never overwrite any local version so long as I don’t update the remote copies from somewhere else, but I’m paranoid when it comes to my backups and prefer to be explicit), and secondly it obviously has to connect to S3 to determine the sync status, wheras I always know whether I need to upload new files just from my local environment (and S3 charges per request – not much, but it’s not zero and it’s the principle of the thing). So, I decided not to use the ‘sync’ command, and just determine locally what new files I needed to ‘put’ on the server.

Secondly, encryption is a must, since some of the data is sensitive and I don’t want to trust anyone else with it. I used to manually GPG my tarballs before uploading them, but I noticed that s3cmd supports an encryption option too. It just uses GPG anyway, just in symmetric form rather than asymmetric like my version did (translation: you use the same passphrase for encryption and decryption; a little less secure than using generated public/private keys but still ok so long as you pick a good passphrase and look after it). The default symmetric algorithm in gpg is CAST5 which seems pretty good, although you can change it if you want by editing your s3cmd config file. So, I decided to give it a try – after you configure s3cmd to use encryption, it actually automatically decrypts too when you pull the data back (symmetric key, remember) – being distrustful, I pulled the data back from S3 in a different environment and examined it, and it was indeed complete gibberish, but decipherable with the passphrase. Good stuff.

So, here’s my little script which will upload the encrypted contents of a folder to S3 – just the contents that have been added or updated since the last sync of that folder, and will encrypt them by default. I just run this on a cron schedule now and it seems to work fine. License is MIT, use at your own risk, no warranty is given that it won’t destroy every file on your machine or eat your children. Usage is like this:

s3putsecurefolder /my/source/folder my.s3.bucket

Edit: it was brought to my attention that Amazon have made it easier to create pseudo-folder structures in S3 buckets since I last tried to do it (I swear it used to throw out keys with forward slashes in them, I had to mangle my names last time I did this), so I’ve updated the script to allow nested folders too.

4 Responses to “s3putsecurefolder”

  1. niko Says:
    July 28th, 2009 at 3:26 pm

    hm, amazon s3 buckets are flat? Maybe I’m getting this wrong here, but when I last used it, you could create folders in every bucket and even assign access rights to each folder similar to buckets.

  2. Steve Says:
    July 28th, 2009 at 3:35 pm

    You probably used a tool that emulated folders, or maybe Amazon have wrapped this since I last investigated. Strictly S3 stores ‘keys’, so you can emulate folders by including folder paths in the key, but it’s not a real folder structure. Last time I tried it didn’t support forward-slashes (maybe that’s changed now, I haven’t tried in ages), hence why I generated keys from the path without slashes. But S3 buckets are strictly simply sets of key/value pairs, ie flat.

  3. Steve Says:
    July 28th, 2009 at 3:46 pm

    Ok just tested, Amazon has wrapped this now in s3cmd, you can include “folder paths” in the destination including forward slashes and it allows them unchanged as a key prefix – that definitely did not work last time I set this up (early 2008), I had to mangle my paths to replace foward slashes before. Good to know, I could make the script handle nested directories then.

  4. Andy Says:
    July 29th, 2009 at 5:35 am

    I always enjoy learning how other people employ Amazon S3 online storage. Check out my very own tool CloudBerry Explorer that helps to manage S3 on Windows . It is a freeware. http://cloudberrylab.com/

Leave a Reply