Category Archives: Tech

Internet Linux Tech

S3 encrypted upload script, v2 (Python)

pythonOk, so I discovered a number of shortcomings in my recent attempt to sync a folder in one direction to Amazon S3 using encryption, the most important of which was that it wouldn’t resume a failed transfer efficiently, which in the case of large transfers wasn’t at all ideal (as I learned to be own cost – damn my 256k upload speed).

So, this is attempt number 2. I decided to completely rewrite the script in Python instead to give me some more flexibility, coupled with the availability of Boto, a nice Python library for accessing all the Amazon Web Services. Rather than rely on just local information, or even date/time stamps, I decided to use hashes to track whether files were different. Amazon already stores the MD5 of the file you upload to them and makes that available without downloading the file, but that’s no use when you encrypt your files before uploading them, because the MD5 is of the encrypted contents rather then the original; so unless you keep the encrypted copies around too, or encrypt the local files again every time just to check the match (expensive if you’re dealing with large files) you won’t be able to compare them – I think this is the reason why ‘s3cmd sync’ currently doesn’t support encryption.

So, I decided to use S3′s ability to store custom metadata in keys, and stored the MD5 hash of the original file against the encrypted contents that I uploaded. That way, I can check the hashes against each other pretty quickly without having to re-encrypt the local files. If the hashes are different, I encrypt and upload. This approach trades a bit of preprocessing against avoiding uploads, so it’s likely to be more efficient on small groups of very large files rather than lots of small files – that’s how I use S3 for my backups of course. It also means I don’t have to worry about timestamp variations, it’s the content of the file that is the driver of whether it’s uploaded or not.

So, here’s the new version. It’s a bit more powerful than the last one – I’m calling gpg myself now so you have the choice between encrypting using public keys (more secure, and the default), or using symmetric encryption with a passphrase. You need to install Boto before you can run it, and it depends on Python 2.5 with hashlib installed. I’ve run it on both Linux and the Mac, it should work on Windows too provided you take the trouble to set up Python and GnuPG, but I haven’t tried; my Linux (apt) and OS X (macports) setups make these things quicker so being short of time I just went with that. Here’s the usage from –help:

Usage: s3putsecurefolder.py [options] source_folder target_bucket gpg_recipient_or_phrase

Options:
  -h, --help            show this help message and exit
  -n, --dry-run         Do not upload any files, just list actions
  -a ACCESS_KEY, --accesskey=ACCESS_KEY
                        AWS access key to use instead of relying on
                        environment variable AWS_ACCESS_KEY
  -s SECRET_KEY, --secretkey=SECRET_KEY
                        AWS secret key to use instead of relying on
                        environment variable AWS_ACCESS_KEY
  -c, --create          Create bucket if it does not already exist
  -v, --verbose         Verbose output
  -S, --symmetric       Instead of encrypting with a public key, encrypts
                        files using a symmetric cypher and the passphrase
                        given on the command-line.

Once again, no warranty is given, MIT license. If you see that I’ve done anything dumb, let me know :)

Linux Open Source Tech Web

s3putsecurefolder

Edit: this script is deprecated in favour of a rewritten version 2.

I use Amazon S3 to host large media files which I want cheap scalable bandwidth on, and for expandable offsite storage of important backups. I used to have some simple incremental tar scripts to do my offsite backups, but since I moved to Bacula, I’ve just established an alternative schedule and file set definition for my offsite backups, the critical subset of data I couldn’t possibly stand to lose (like company documents). Since I was refreshing all my procedures and tarring the Bacula volumes no longer made any sense, I rewrote my script for putting the resulting backup data on S3.

The prerequisite in all cases is s3cmd, which is pretty mature now and available on most distros (“apt-get install s3cmd” and you’re done on Ubuntu). s3cmd actually has a ‘sync’ command, but firstly that tries to sync in both directions, which I don’t want (I know in theory it should never overwrite any local version so long as I don’t update the remote copies from somewhere else, but I’m paranoid when it comes to my backups and prefer to be explicit), and secondly it obviously has to connect to S3 to determine the sync status, wheras I always know whether I need to upload new files just from my local environment (and S3 charges per request – not much, but it’s not zero and it’s the principle of the thing). So, I decided not to use the ‘sync’ command, and just determine locally what new files I needed to ‘put’ on the server.

Secondly, encryption is a must, since some of the data is sensitive and I don’t want to trust anyone else with it. I used to manually GPG my tarballs before uploading them, but I noticed that s3cmd supports an encryption option too. It just uses GPG anyway, just in symmetric form rather than asymmetric like my version did (translation: you use the same passphrase for encryption and decryption; a little less secure than using generated public/private keys but still ok so long as you pick a good passphrase and look after it). The default symmetric algorithm in gpg is CAST5 which seems pretty good, although you can change it if you want by editing your s3cmd config file. So, I decided to give it a try – after you configure s3cmd to use encryption, it actually automatically decrypts too when you pull the data back (symmetric key, remember) – being distrustful, I pulled the data back from S3 in a different environment and examined it, and it was indeed complete gibberish, but decipherable with the passphrase. Good stuff.

So, here’s my little script which will upload the encrypted contents of a folder to S3 – just the contents that have been added or updated since the last sync of that folder, and will encrypt them by default. I just run this on a cron schedule now and it seems to work fine. License is MIT, use at your own risk, no warranty is given that it won’t destroy every file on your machine or eat your children. Usage is like this:

s3putsecurefolder /my/source/folder my.s3.bucket

Edit: it was brought to my attention that Amazon have made it easier to create pseudo-folder structures in S3 buckets since I last tried to do it (I swear it used to throw out keys with forward slashes in them, I had to mangle my names last time I did this), so I’ve updated the script to allow nested folders too.

hardware OS X Tech

Apple owns the US premium retail PC market

apple_logoThis was pretty interesting; CNet reports that according to NPD stats, Apple has 91% of the retail PC sales in the US above $1,000.

Now, let’s add the caveats here:

  • That’s retail PCs. Of course, loads of people build their (desktop) PCs from OEM parts rather than buying a prebuilt machine, so it’s safe to say that these sales are almost all going to be laptops, where Apple particularly shines.
  • Also, these are primarily going to be consumer purchases, because businesses tend to buy in bulk and not at retail (excluding the smaller businesses) – again Apple is far more popular in the consumer space than in business (barring the iPhone).
  • The above $1,000 range is a minority of all sales; a majority of people buy cheap rubbish ;)
  • This is only the US, where Apple seems more popular

So, the headline isn’t quite as accurate in its crushing assessment as the wholesome reality, but even so it’s pretty impressive. When it comes to laptops, I always buy quality because I’ve been disappointed many times by machines that looked good on paper but turned out to be poorly constructed, poorly designed, and had all kinds of heat / battery life / general robustness issues, which led me to always buy from the ‘premium’ range in the last 6 years or so. At first that was the likes of the top-end Sonys, but after being convinced to try a MacBook Pro, I’ve been so pleased with the overall construction / design and the ability to use OS X as well as Windows that I’m very unlikely to buy anything else next time (my next hardware revision will be 2010, I generally switch every 3 years, which is reasonable if you buy something decent to begin with).

The talk now is about whether Apple will start making a netbook, to compete at the cheaper end of the market. Personally I don’t care – I quit buying cheap laptops ages ago, I don’t think it really ends up being cheaper in the long run. Powerful and cheap machines tend to be poorly built – I’ve burned through (literally in a couple of cases!) far too many laptops that couldn’t handle their power actually being used regularly, or which developed problems because the build quality was naff. Cheap machines with decent construction but lower spec (e.g. netbooks) just need upgrading faster if you have my sort of needs, or are just a supplement to a ‘real’ machine, either of which costs more in aggregate, and the resale value when you do upgrade is usually not even worth considering. In the round, buying a premium laptop relatively infrequently works far better for me, and as such Apple already provide what I want. YMMV :)

Tech Windows

Amazon Win7 upgrade spam – empty your wallets

Amazon has started email-bombing people in the UK with Windows 7 pre-order offers, a little while after a similar pre-order offer was available in the US. Windows 7 is the first version of Windows that I’ve found myself being upbeat about since 2001, so I cheerfully clicked the link. The result was an offer of Windows 7 Professional “E” (the European version with IE removed, congrats EU on fighting an originally well-intentioned battle that ceased to be practically relevant almost a decade ago) for a ‘discounted’ price of £180. Wait, what? £180 for basically Vista how it should have been? And that’s a pre-order discount?

I have several copies of XP and Vista on my bookshelf. The Vista ones currently grace only test machines (although post-SP1 it was somewhat more tolerable to use, if still blissfully flagrant with resources), but nevertheless I own several copies. In the US, there was talk of a $99 limited pre-order offer for an upgrade of Win7 Pro, which is still expensive alongside Snow Leopard’s $29 upgrade (which offers a similar level of refinement), but would still be doable – that pales in comparison to £180 (or $295).

This is actually for a ‘full’ copy, not an upgrade, because the fact is that there will be no upgrade version for Europe. The official excuse is that the “E” version requires a re-install, but anyone with a handful of brain cells to rub together knows that’s a crock; the nuances of the install procedure have little to do with the price that they can offer for the license itself – after all upgrade software has for years been able to be used as a fresh install too, provided you have an original CD from the previous version (I doubt anyone would buy an upgrade copy of Windows if they were unable to reinstall afresh at some point – a regular ritual for many users). No, the lack of an upgrade version, and the pricing policy in general, is blatantly a “screw you” to Europe because of the whole IE debacle.

The original word was that “European customers will be able to buy a full retail version of Windows 7 E at the same price as the usually cheaper upgrade version, at least for the rest of this year” (link). Ok, shame that that ‘upgrade’ price, that limited time ‘special deal’ price, is actually 3 times the price of the limited time upgrade offer that  the US had. So in fact what you’re saying is that while the upgrade pre-order already looks extortionate, the price is going to be even higher in 2010.

I’m going to have to buy a copy or two anyway, but jeez, way to make me feel totally ripped off guys. I’m going to be upgrading OS X to Snow Leopard at pretty much the same time, and the price difference is going to be glaring. Microsoft made their name originally by bringing value to the masses. Where did that principle go?

Linux Tech

Bacula is nice

As I’ve talked about recently, as a background task, I’m setting up a new Ubuntu server to take over the main file server, mail server, build server, backup server, web server, and you-name-it-server duties of my home office. It will eventually be taking over from a venerable Debian server, which was built on some old hardware left over from retired machines (except with 2 new mirrored hard drives) and has basically sat in a corner being rock-solid for years without me touching it, at least until the PSU failed (and was rapidly replaced), reminding me that I’ve been meaning to upgrade it for over a year. This machine does more things than I care to remember half the time, so without a huge amount of time to spare it’s taking me a while to bring its younger, upgraded, more power-efficient replacement fully online.

One of the things I wanted to move away from was my manually scripted backups. Under Linux backups are a generally simpler affair than on Windows; just tar & compress the file system, skipping a few folders like /proc and /tmp and you’re pretty much done, so I wrote a few simple scripts and left it at that, which has worked ok for a while, but as I started to want more complex backup strategies (different subsets sent to S3, etc), manually scripting everything was becoming tedious. I’d tried setting up Amanda before but found it a little overcomplicated for my needs (and reverted to scripts), so I decided to try Bacula this time.

In all it probably took me 3 evenings to get it set up the way I wanted, of which 2 were spent reading the Bacula manual and experimenting, and 1 was solely spent trying to get around an issue with mounting a remote drive on demand. I don’t use tape drives, because I think they’re overpriced and an unnecessary hassle for a small office setup; instead I send all my backups directly to a separate (RAIDed) NAS box, and replicate them to another machine for added resilience. I also upload a critical subset (encrypted) to Amazon S3 for emergency off-site recovery purposes; a removable external HD would be another option, but a) I’d have to keep taking it off-site all the time, and b) you’d really need 2 copies, because removable HDs are just not resilient enough unless they’re at least RAID1).

The problem I had was getting the NAS box mounted on-demand for the backups; I do it over CIFS (Samba) – I could use NFS (since the NAS runs embedded Linux and has NFS support) but the nice thing about using CIFS is that I know I can redirect it to any machine in the office at any time if I need to, since everything supports it (including OS X and Windows), and I won’t start hitting new issues because of a protocol switch. Since we can’t rely on the NAS being mounted all the time (it may be taken offline from time to time), Bacula needs to mount it & unmount it on demand – this is what my manual scripts used to do. Bacula seemed to support this via the “Requires Mount=yes” option on the storage device, and being able to define mount and unmount commands, but I could never get this to work, despite trying for most of an evening (it would just complain about the device not being mounted, and seemingly not try to run the specified mount commands). In the end, I used a much better overall approach anyway, by using autofs to auto-mount the NAS box when a specific path is accessed, and unmount it after a period of no use. That solved the problem in Bacula (I could just point it at a path and let autofs handle the rest), and also meant it’s much easier to browse the NAS from this machine generally.

So, now I have it set up, I’m really quite pleased with it. It supports on the fly compression but does it per file (unlike how you would do it with tar, which is on the whole archive) which makes it a little more reassuring that you wouldn’t lose as much in the case of a limited data corruption. The backup schedules and volume recycling configuration are really very powerful, and they have features in there which make doing backups to disk systems rather than tapes very convenient. The visibility is generally a lot better than my manual scripts, restoring is easier, it’s easier to keep on top of the space used with the recycling, and it’s far easier to manage a ‘deeper’ full/incremental policy without making your restore process harder. Plus, I have a queryable log of everything (rather than just cron logs and notification emails) which feeds metadata into the restore process as you’d expect. Good stuff – all the kinds of features you’d expect from a commercial backup solution.

Since I don’t install X on my servers, I had to configure everything with a text editor rather than the GUI Qt tool (‘bat’), but really it wasn’t a big deal. Once you understood the concepts, the text configuration was really quite straightforward (and well commented), as most good Linux server apps are.  Of course, I can run the GUI on my Windows or Mac desktops too, and configure the scheduler remotely from there. I haven’t tried that yet.

So far, Bacula seems like a great solution for small office backups.

Internet Tech Web

YouTube putting the bullet in IE6′s head?

Oh, please let this come to pass soon. TechCrunch reports that YouTube is due to drop support for IE6 ‘soon’, pointing users at Chrome (obviously), IE8 and Firefox 3.5. Finally, one of the worst pieces of software ever to pollute the Internet with its presence is getting taken out to the barn with a double barelled shotgun, and not a moment too soon.

Sure, Digg already said they might do this, but YouTube is far more significant; if YouTube stop supporting IE6, then in practice it means I can too :) Bye bye IE6, please do let the door hit your ass on the way out, preferably hard enough to fracture your pelvis. It’s the least you deserve for making life hell for hundreds of thousands of web site maintainers over the last few years.

Tech Web

Browsers just aren’t sexy anymore

I’ve been running Firefox 3.5 and Internet Explorer 8 on this machine for a little while now. Both are worthy upgrades to their line, addressing their previous shortcomings quite nicely – Firefox is now faster and more importantly leaner on memory use, and Internet Explorer seems to have mostly shaken off the dull, bare bones feel that it’s had in the past, and is definitely faster and more standards compliant.

I actually feel I could use any of Firefox, IE, Safari, Chrome or Opera now and be fairly happy. I’m sticking with Firefox, because the addons I use still keep it ahead of the alternatives as a user experience for me – the reason that I find it better is, I think, that a vibrant community inherently produces enough breadth that I can always find things which make a substantive improvement to the way I personally want to use the browser. No matter how many snazzy features a single team decides to put in a browser, they’re never going to hit the mark with everyone, and I find that only a small percentage of the in-the-box feature points of IE8, Safari or Chrome are of any real interest to me.  That’s why just a speed bump and memory optimisation was all I really needed from Firefox 3.5; I make my own recipe of must-have features from the community instead.

But still, the days of ‘browser X sucks compared to browser Y’ seem to be mostly over for the moment, as competition has levelled the playing field to the extent that it’s mostly personal preference on the small things that remain.  That’s a huge improvement from Microsoft particularly, who deserved their reputation for producing terrible browsers in the past, but who I think have now earned the right to shake off that reputation. As much as it’s a difficult adjustment to make, IE is no longer a bad browser. It’s just another decent browser that is missing my Firefox addons ;)

Linux Tech

UNetbootin is awesome

I just  assembled a new server machine, which in the end I chose to house in a shiny aluminium Thermaltake Lanbox, which is relatively compact but still roomy enough for two hard drives, a bog-standard power supply, and plenty of airflow, which is what I wanted. I also knew that the fans on this case were nice and quiet (I have a black steel version as a GPU test box, I wanted a lighter version this time!), which is important for a machine that will be on all the time.

As I said in my previous posts, I was determined not to put an optical drive in this machine. It really doesn’t need one, since all the system software will be downloaded anyway – the only possible use for an optical drive would be to boot the machine in the first instance, and that seemed a total waste. I know DVD drives are cheap, but why clutter up the box with one just for the rare occasion when I need to manually boot? The same goes for floppy drives, which are such dinosaurs I can’t believe some new machines still come with one present – the only possible use for a floppy drive these days is to provide a slot that you don’t mind a toddler feeding jam sandwiches into.

No, instead I wanted to boot from a USB flash drive. I’d never done this before, so I scouted around for the best ways to do it. Syslinux came up pretty quickly as the primary contender, but being lazy I hunted around a bit longer to see if anyone had a simpler way than configuring Syslinux manually. That’s when I came across UNetbootin.

What a fantastic little project! It took literally 5 minutes from downloading, to creating a bootable USB disk with the distribution of my choice on it (UNetbootin will download your chosen distribution automatically – or you can supply an ISO of your choice if you want), to booting up the new machine. I couldn’t believe just how simple it was! I chose to put Ubuntu 8.04 Netinstall on the disk, which clocks in at a tiny 9Mb because it’s just enough to boot up the installer to start downloading the real packages direct, but if you want, and you have a big enough USB stick, you can a complete distro on there too. But this way, I can use a crappy old USB stick I have lying around as my boot device.

A great little tool anyway. I love it when things are easier than you expect.

Internet Tech Web

Opera Unite – another step in the right direction?

operatuniteI’ve harped on many times about how I think centrally controlled services like Facebook are the antithesis of what the Internet was supposed to be about – a distributed, decentralised place with authority controlled at the leaves by those with most interest in maintaining it, rather than some corporate hub holding all the cards.

Well, it seems like a small bunch of companies are starting to latch on to this idea too, a welcome respite from the huge number of ventures that just want to be the new singular nexus of your internet life. Google Wave certainly ‘gets it’, if the reality reflects the stated vision where the open-source software can be run anywhere, not just on Google’s servers. And Opera Unite is making the right kind of noises for me too, even if right now the service is embryonic.

In essence, it’s a semantically richer, more secure version of BitTorrent – the ability to share files, photos and media within interfaces dedicated to that purpose, serve web pages, and chat, but by making direct connections with your peers rather than going through a centralised hosting service. Opera Unite provides the software to perform the hosting from your own devices, and provides the discovery and network trust systems to allow people to hook up.

There are lots of issues with this approach of course – such as whether you trust the hosting software not to punch holes in your local security, whether you really want to have the bandwidth issues of self-hosting, what happens when your machine is off, etc. Right now, I don’t think it’s that workable as a replacement for centralised systems, but that’s not the point – the point is that the principle of entrusting all your unencrypted data to a single online entity is eventually not going to be good enough anymore, and we need to be developing alternative approaches. If the future is truly in the cloud, we need far more than what the cloud offers right now – which is to say services that while user-friendly, require you to give up far more control over your data than is feasible for anything remotely important. Sure, you’re happy to put photos on Facebook, and Twitter about all those things that you don’t mind the world knowing, but that’s a very specific, non-critical subset of the data we all increasingly need to hold. Would you be happy to scan your bank statements and put them on Facebook, even if you set them to private? Of course not – but if the cloud is to realise its potential, these are the kinds of harder applications we need to try to address.

I’m not saying Opera Unite addresses that – not even close. But the fact that people are exploring alternative approaches to the 100% centralised model is a positive sign to me. We need to start tackling how we use entirely public transport & repository systems (ie the cloud) to securely store and exchange important and sensitive data, and I say that’s impossible to address with an entirely centralisd model, because a centralised model focusses control in too few hands. Encryption gives us the ability to store and transport secure information in plain sight, but it’s traditionally a very tricky thing to make easy to use for the general public, particularly when multiple parties and ‘controlled’ sharing is required. Thus, one approach is to focus on securing the transport instead (which is easier, and why SSL is ubiquitous) and lock down access to the leaves more tightly. Opera Unite is an experiment in the leaf model and may well inform the process, leading to more innovation in this area down the road.

hardware Tech

Server hardware scouting

Ok, so I’ve been doing a bit of looking around for my new server builds. As I’ve thought about this, I’ve firmed up my requirements to the following:

  • Low-power, low-noise
  • 2 x 3.5″ SATA2 hard drive bays (hot plug not required, I’m just going to use Linux’s built-in RAID1 again)
  • All standard, replaceable components – no custom PSUs especially
  • Small form factor (as much as possible given the other requirements)
  • Cost-effective
  • Performance almost irrelevant

The things I have decided on:

  • CPU / Motherboard: Intel Atom 330 on D945GCLF2 motherboard. The Atom’s power usage is great (8W idle), the 945G chipset is not so great (25W!) but as a combo they’re still pretty damn good, and not expensive. VIA do the only other alternatives but I’ve had some issues with VIA in the past.
  • HDD: 2 x Western Digital Caviar Greens, because they’re low-power and run cool
  • 1GB miscellaneous RAM :)
  • No optical or floppy drives, I don’t need them (boot from USB flash drive, OS and other software will be directly downloaded)

The main problem I have now is finding a case. It has to be relatively small, and preferably stackable so I can put two of them on top of each other. Most of the Mini-ITX cases have 2 problems: they either don’t take 2 HDDs, or they use a custom PSU (or both) – I’ve been burned with having a custom PSU on a Mini-ITX machine that failed before, with no replacement available, and have no intention of going down that route again; everything has to be stock, so I can whip it out and replace it easily even if the case model has been discontinued.

Some cases come with an external ‘power brick’ PSU which in itself is pretty standardised (60/80/120W usually), but I remain concerned about the DC circuitry that the brick connects to in the case; if that fails it could be a pain. At least a standard PSU is replaceable in its entirety very easily. And plus, those cases that externalise the PSU tend to be too small to take 2 HDs anyway – all except the Chenbro ES34069 but it’s stupidly expensive.If anyone knows of any others, and has experience of the resilience of external DC power systems, please let me know.

So in the absence of a better option, I’m leaning towards a standard ‘cube’ case like the ThermalTake Lanbox – I have one of these already for a test machine and it’s good, if a little heavier and larger than I actually need (and even the lowest power standard PSU will be overkill, even if I go for an 80 plus certified one). If I was a case designer, I’d take this case, slice about 4 inches off the side and a little off the top, and I’d have precisely what I want – a stackable compact mini-ITX box which uses all standard components and can fit 2 HDDs comfortably. Is that so much to ask?