S3 encrypted upload script, v2 (Python)

Internet, Linux, Tech 9 Comments

pythonOk, so I discovered a number of shortcomings in my recent attempt to sync a folder in one direction to Amazon S3 using encryption, the most important of which was that it wouldn’t resume a failed transfer efficiently, which in the case of large transfers wasn’t at all ideal (as I learned to be own cost – damn my 256k upload speed).

So, this is attempt number 2. I decided to completely rewrite the script in Python instead to give me some more flexibility, coupled with the availability of Boto, a nice Python library for accessing all the Amazon Web Services. Rather than rely on just local information, or even date/time stamps, I decided to use hashes to track whether files were different. Amazon already stores the MD5 of the file you upload to them and makes that available without downloading the file, but that’s no use when you encrypt your files before uploading them, because the MD5 is of the encrypted contents rather then the original; so unless you keep the encrypted copies around too, or encrypt the local files again every time just to check the match (expensive if you’re dealing with large files) you won’t be able to compare them – I think this is the reason why ‘s3cmd sync’ currently doesn’t support encryption.

So, I decided to use S3′s ability to store custom metadata in keys, and stored the MD5 hash of the original file against the encrypted contents that I uploaded. That way, I can check the hashes against each other pretty quickly without having to re-encrypt the local files. If the hashes are different, I encrypt and upload. This approach trades a bit of preprocessing against avoiding uploads, so it’s likely to be more efficient on small groups of very large files rather than lots of small files – that’s how I use S3 for my backups of course. It also means I don’t have to worry about timestamp variations, it’s the content of the file that is the driver of whether it’s uploaded or not.

So, here’s the new version. It’s a bit more powerful than the last one – I’m calling gpg myself now so you have the choice between encrypting using public keys (more secure, and the default), or using symmetric encryption with a passphrase. You need to install Boto before you can run it, and it depends on Python 2.5 with hashlib installed. I’ve run it on both Linux and the Mac, it should work on Windows too provided you take the trouble to set up Python and GnuPG, but I haven’t tried; my Linux (apt) and OS X (macports) setups make these things quicker so being short of time I just went with that. Here’s the usage from –help:

Usage: s3putsecurefolder.py [options] source_folder target_bucket gpg_recipient_or_phrase

Options:
  -h, --help            show this help message and exit
  -n, --dry-run         Do not upload any files, just list actions
  -a ACCESS_KEY, --accesskey=ACCESS_KEY
                        AWS access key to use instead of relying on
                        environment variable AWS_ACCESS_KEY
  -s SECRET_KEY, --secretkey=SECRET_KEY
                        AWS secret key to use instead of relying on
                        environment variable AWS_ACCESS_KEY
  -c, --create          Create bucket if it does not already exist
  -v, --verbose         Verbose output
  -S, --symmetric       Instead of encrypting with a public key, encrypts
                        files using a symmetric cypher and the passphrase
                        given on the command-line.

Once again, no warranty is given, MIT license. If you see that I’ve done anything dumb, let me know :)

YouTube putting the bullet in IE6′s head?

Internet, Tech, Web 9 Comments

Oh, please let this come to pass soon. TechCrunch reports that YouTube is due to drop support for IE6 ‘soon’, pointing users at Chrome (obviously), IE8 and Firefox 3.5. Finally, one of the worst pieces of software ever to pollute the Internet with its presence is getting taken out to the barn with a double barelled shotgun, and not a moment too soon.

Sure, Digg already said they might do this, but YouTube is far more significant; if YouTube stop supporting IE6, then in practice it means I can too :) Bye bye IE6, please do let the door hit your ass on the way out, preferably hard enough to fracture your pelvis. It’s the least you deserve for making life hell for hundreds of thousands of web site maintainers over the last few years.

tengrandisburiedhere.com

Comedy, Internet, Web 19 Comments

Oh, this is so ripe for satire I really can’t believe Microsoft didn’t see it coming. Or, perhaps they did and just ran with it anyway, for funsies. It appears Microsoft’s Australian website is encouraging people to switch to IE8 by offering an online treasure hunt, where a series of clues will lead you to a site identifying the location of the $10k (AU$ presumably), which can only be viewed with IE8. They gleefully point out:

“But you’ll never find it with old Firefox. So get rid of it, or get lost.”

So, let’s stack up the issues here:

  • Microsoft has resorted to offering a monetary incentive to encourage people to use its free browser. Is that an admission that based on just the merits of the product itself, IE8 probably wouldn’t be the user’s first choice? I’d guess that people who actually choose their browser (rather than accepting what they get preinstalled) are not that likely to pick IE8.
  • ‘old’ Firefox? Last I checked, IE predated Firefox by some years, and the latter has a new version coming out in mere weeks. Resorting to empty name-calling now? Dear me.
  • Websites that only work in IE? Wow, welcome back to 1999 guys. ActiveX, Outlook Express bindings – ah, the memories. The horrible, eye-watering memories.

A Mozilla dev has already fired back a response, but really I don’t think it needed one. I think the fact that this promotion exists at all, and the tone which it takes, speaks volumes about how much the browser landscape has changed in recent years.

Opera Unite – another step in the right direction?

Internet, Tech, Web 3 Comments

operatuniteI’ve harped on many times about how I think centrally controlled services like Facebook are the antithesis of what the Internet was supposed to be about – a distributed, decentralised place with authority controlled at the leaves by those with most interest in maintaining it, rather than some corporate hub holding all the cards.

Well, it seems like a small bunch of companies are starting to latch on to this idea too, a welcome respite from the huge number of ventures that just want to be the new singular nexus of your internet life. Google Wave certainly ‘gets it’, if the reality reflects the stated vision where the open-source software can be run anywhere, not just on Google’s servers. And Opera Unite is making the right kind of noises for me too, even if right now the service is embryonic.

In essence, it’s a semantically richer, more secure version of BitTorrent – the ability to share files, photos and media within interfaces dedicated to that purpose, serve web pages, and chat, but by making direct connections with your peers rather than going through a centralised hosting service. Opera Unite provides the software to perform the hosting from your own devices, and provides the discovery and network trust systems to allow people to hook up.

There are lots of issues with this approach of course – such as whether you trust the hosting software not to punch holes in your local security, whether you really want to have the bandwidth issues of self-hosting, what happens when your machine is off, etc. Right now, I don’t think it’s that workable as a replacement for centralised systems, but that’s not the point – the point is that the principle of entrusting all your unencrypted data to a single online entity is eventually not going to be good enough anymore, and we need to be developing alternative approaches. If the future is truly in the cloud, we need far more than what the cloud offers right now – which is to say services that while user-friendly, require you to give up far more control over your data than is feasible for anything remotely important. Sure, you’re happy to put photos on Facebook, and Twitter about all those things that you don’t mind the world knowing, but that’s a very specific, non-critical subset of the data we all increasingly need to hold. Would you be happy to scan your bank statements and put them on Facebook, even if you set them to private? Of course not – but if the cloud is to realise its potential, these are the kinds of harder applications we need to try to address.

I’m not saying Opera Unite addresses that – not even close. But the fact that people are exploring alternative approaches to the 100% centralised model is a positive sign to me. We need to start tackling how we use entirely public transport & repository systems (ie the cloud) to securely store and exchange important and sensitive data, and I say that’s impossible to address with an entirely centralisd model, because a centralised model focusses control in too few hands. Encryption gives us the ability to store and transport secure information in plain sight, but it’s traditionally a very tricky thing to make easy to use for the general public, particularly when multiple parties and ‘controlled’ sharing is required. Thus, one approach is to focus on securing the transport instead (which is easier, and why SSL is ubiquitous) and lock down access to the leaves more tightly. Opera Unite is an experiment in the leaf model and may well inform the process, leading to more innovation in this area down the road.

Thank you, System Restore

Internet, Windows 2 Comments

I bitch about Windows on occasion, but I have to give it credit for System Restore, which saved my ass today.

Some Cisco VPN software which I was trying to install to help a client completely f*cked all my network access on my primary workstation, effectively rendering it useless. Not only that, but it refused to uninstall (hang), or disable (hang) in any way, even from safe mode, and appeared to install no useful tools or documentation with which to diagnose said problems, while disabling all other useful diagnostics (ipconfig returned nothing, device manager claimed both Cisco and regular network devices were fine, all other configuration tools just hang). After much swearing & experimentation I remembered System Restore, which I’d never had cause to use before, since I’ve never got myself into such a dead end before, but which worked like a charm.

I’ve used Window’s own VPN connections before and they’ve been fine. Cisco though, what a PITA – I’ll definitely think twice about trying to use that again. That combined with my server’s PSU mucking about today has not made this one of the most stress-free days. I guess it’s karma for having a good day yesterday.

Google Wave – email finally RIP?

Internet, Open Source, Tech, Web 12 Comments

googlewaveMany people have declared email to be dead in the past, and they’ve all been wrong. The typical play has been from instant messenger advocates, and most recently from Facebook. But, while these options have been a valid all-encompassing solution for teenagers and students, I haven’t met a single serious modern IT user whose life isn’t still driven primarily by email. There’s a reason that Outlook and Exchange are such consistent cash cows for Microsoft, and so many business people own Blackberrys. IM, Facebook & Twitter may represent certain facets of your online existence, but if push really came to shove, and you were only allowed to use one electronic service, I bet you every gadget I own that almost everyone will opt to keep their email over anything else.

I certainly could not operate without my email, but after watching the demo of Google Wave, I saw for the first time something that could genuinely be better, without leaving me with a gaping functionality hole in the name of ‘progress’. In retrospect, the idea seems incredibly obvious, but I’m sure the implementation was tricky.

In essence, Google Wave is basically a fusion of email and IM functionality. You still compose emails, reply to them, and include people in the threads, but the whole functionality set can also operate basically in real-time, just like an IM client. Whether something is instantly transmitted interactively depends on whether the person you’re sending it to is online, and some preferences of your own. There were some nice geeky demo things like instant translation via a bot too, but the most important thing to me was how holistic it was. One of the major problems I have is connected with using a mixture of IM and email communication, particularly with clients but also friends. I might remember that I had a conversation with X about Y, and want to go refresh my memory about it, but I don’t remember whether we talked about it on email, or on IM (and whether it was Skype, GTalk, MSN, and which machine I was on at the time). Looking up this information is a pain, because my email is one island of information, and my IM conversations are many separate islands. Being able to search across the whole thing in one swoop, from any PC/device, and see all the conversations, both deferred email-like and instant, all in their original threaded contexts, would be absolutely fantastic. It would give me value in my working life, instantly, right now.

They could go even further, by supporting VOIP calls through it too, and have the option to use voice recognition to transcribe it (or record it as well), or even just to log that the conversation happened and allow me to add a few manual notes to it. I’d imagine this would be early on the list of extensions, by Google themselves or by external developers, since they’re encouraging people to use the API. And I can imagine that linking all this up with Google Maps, Google Calendar, Google Docs etc will have a multiplicative effect.

So I’m pretty excited about Wave. It’s the first collaboration tool I’ve seen that could genuinely replace my email (and IM), although I’d then have to tackle the very real question of whether I really want to give Google control of such critical parts of my electronic life.

Who cares about #fixreplies?

Internet, Web 2 Comments

So, the intertubes are awash today with people venting their spleens about Twitter’s decision to stop sending replies by people you do follow, but to people you don’t follow, to your main Twitter feed. Previously you had the option either way, and now some people are getting their panties in a bunch about it.

There are two things to say about this issue:

  1. Personally, I don’t want to see all the random replies to other people I don’t follow. I already deliberately only follow a small number of people, beacuse frankly I don’t have time to sift through a huge list of tweets every day. I have absolutely no idea how anyone copes with following more than about 10 people who tweet regularly, and still get something done in the day, nevermind seeing all the secondary replies. Am I just inefficient at processing large numbers of posts, or do I just have a staggeringly lower level of patience than the average Twitter user? The way I have things right now, I read every one of the posts from people I follow, because I consider them interesting, and that takes little time. I couldn’t do that if I was following 100 people and their replies to other contacts too, so I’d either have to lie (ie stick to etiquette and follow them, but then filter out most of what they say on the client), or just spend all day reading Twitter. So personally, this seems a sensible choice – you can always use the Twitter web if you really have nothing better to do but surf Twitter, or browse your friends ‘following’ list if you’re desperate to mine the system for new contacts.
  2. Twitter is free. If you paid nothing for a service, you are entitled to offer your constructive feedback which the providers may choose to listen to, but you are not entitled to have a major tantrum about it. As Matt Asay suggests, if you care about the service that much, then you should probably be paying for it – and God knows, Twitter needs a business model other than the typical Web 2.0 “Attract viewers …….  profit!” fantasy right now. On the whole, the Internet needs a slap to wake up its users from the bloated sense of entitlement they’ve developed over the years, fueled by a huge number of startups that delude people into thinking they can expect everything for nothing. 100% free models don’t work (yes, I know, I’m an open source advocate, but that doesn’t mean I believe that you can give everything away) – they are a complimentary aspect, or a stop-gap until you can develop a real model or pursuade some sucker to acquire you before the hype train grinds to a halt. Eventually, these cycles of pretending that you can get premium service for free will end, and everyone will have to face up to the reality that ‘freeloaders’ have a place (building momentum, awareness etc), but ultimately they’re at the bottom of the food chain. Plankton are vital to the oceanic ecosystem, but no-one asks them for their opinion. ;)

GeoCities demise should be a warning

Business, Internet 6 Comments

A few days ago, Yahoo! announced that they would be shuttering the venerable GeoCities this year. “So what?” you might well ask – GeoCities is after all an ageing service from a bygone era, and apart from some nostalgia and perhaps some data that some people might have had parked there for a while, most people won’t really notice it’s passing.

But nevertheless, it’s important, and people who get carried away with putting a dollar value on the current favourite websites of the day (e.g. Facebook) should take careful note. GeoCities was huge, really a sensation at the time, before many of the people raving over Facebook now were online, or perhaps even born (scary). It was easily as big culturally as Facebook at the time, which is why Yahoo bought it for almost $3 billion. I bet they regret that, because what happened was exactly what will continue to happen in this sort of space – things changed. New technology comes along, new techniques, new fashions, and the old sites are abandoned like burning ships incredibly quickly, until as happened this week (perhaps a little overdue in fact), the charred, lonely hulk sinks beneath the waves.

The issue is that these sites are not really ‘sticky’ on an individual basis. There’s really very little investment needed to use them, so getting up and moving somewhere else really isn’t much of an issue. Sure, with social networks the main ‘index of stickyness’ is your friends list, so people tend to stick where their friends are, but really, I don’t see this being a major barrier in practice, because by nature most of the stuff on there is non-critical and for fun, and these things are often follow generational ‘clusters’ – the students in the 90s were all on GeoCities, now they’re all on Facebook. Where will the next set be? I wouldn’t for a second assume they’d stay in the same place; they’ll want sites for their own generation, not the last one.

The eventual destiny of GeoCities should be a significant warning to anyone thinking of paying top-dollar for web companies that have no business model beyond casual eyeballs, and rely on fashion to drive that attention. Fashion changes, and you really don’t want to be stuck with $3bn worth of brown corduroy flares (assuming that they’re not fashionable right now – I can’t keep up! ;) )

Spotify – finally a Pandora replacement?

Internet, Music 9 Comments

spotifyLike many of my friends in the UK, I’m a Pandora-mourner. The great thing about Pandora was the great range of music, the unobtrusiveness of the client, and the robustness of the stream – all issues that Last.fm significantly under-delivers on in comparison. Not only is Last.fm’s interface not as pleasant, any time I’d stress my machine (such as hitting all the cores at once with a major batch build), I’d get streaming hiccups. And if there’s one thing that chronically interrupts a sustained groove, it’s hiccups. :?

Yesterday a friend of mine (thanks Jim) pinged me to recommend a relatively new service, Spotify. It seems to have been expanding significantly in the last few months, and knowing my friend had similarly high standards set by Pandora too, I decided to give it a go.

Wow. It’s great! Unfortunately like Last.fm it requires a download & install rather than in-browser play like Pandora, but it’s worth it. The interface is obviously influenced heavily by iTunes, which is no bad thing, and it’s very slick. The most important thing to understand is that unlike either Last.fm or Pandora, Spotify actually lets you pick the exact music you listen to. It has a ‘Radio’ mode too where it picks tracks for you, but this is only based on genres and time periods, rather than the music characteristics which made Pandora so great at introducing you to new stuff you’d like. But, it’s big advantage is that you can just use it just like iTunes and search specifically for tracks / albums / artists – and listen to exactly what you wanted right there, rather than being given ‘similar’ tracks. It’s like a streamed iTunes effectively, with a massive library – I’ve unscientifically tested it out and so far the range seems very good.

The sound quality is excellent, and most importantly it is completely unfazed by heavy CPU usage by other applications. As a test I did a full (fully parallel) build of Ogre in the background, while jumping around doing a few other things too, pretty much maxing out all my cores for a sustained period of time, and not once did the music skip. Yes!

Of course, there has to be a catch, but it’s a small one. In return for being able to access any music you like,  you have to listen to occasional adverts. These are basically just like radio ads, except that if you happen to go to the client while they’re playing, you get web links to the products – which I’m sure is a pro for the advertiser over regular radio advertising. Alternatively, you can upgrade to a ‘Premium’ Spotify account (a tenner a month, or 99p for a day pass) to remove the ads entirely. To be  honest, I found the ads to be less intrusive than those on regular commercial radio, which considering that you get to pick your music and don’t get some DJ talking inanely in between tracks, it’s overall a net gain.

I’m going to keep trialing it to see where it goes, but so far colour me impressed.

Good value commercial DNS providers?

Internet, Tech 4 Comments

DNS hosting is one of those awkward things – it’s absolutely essential to anyone who controls those little textual brands we call ‘domains’, but it’s an invisible service which you don’t appreciate much on a day-to-day basis. The chances are not good that any user of the internet, after a session of heavy web browsing, will say “Wow, DNS was awesome today”.

I’ve used a few approaches to DNS over the years – in the early days when I was naïve, I used the built-in DNS of my web host; which I learned the hard way was a serious mistake, since switching away from a crappy web host is made more difficult when they hold the reigns for routing your domain too. So, I initially moved the DNS to my registrar instead, until I suffered an outage (admittedly only one in many years) caused by them hosting all their DNS in one datacentre (ie single point of failure).

For the past few years I’ve used ZoneEdit to provide distributed, and therefore theoretically more resilient DNS for my domains. That still ended up having problems when the free servers they provide experienced a DDoS attack a couple of years back, taking all the free servers I was using out at once. I compensated for that by purchasing extra servers from them, and setting up a server of my own (I’ve since stopped doing the latter since it’s too much hassle) – so my costs on this front are a modest $20 a year.

Generally, this has worked fine. However in recent weeks I’ve had two problems with ZoneEdit – firstly their interface has been slowing to a crawl far too often which is intolerable when you’re trying to make important changes. Secondly, they have this odd setup where their servers will refuse to answer DNS questions for your domains until they detect that the Whois information for your domain has been updated to point at them. I guess they do this to stop people spamming them with requests for alternate domains, but it has two major drawbacks:

  1. You can’t test your DNS setup until your site is already dependent on it, and
  2. There’s a time lag between the Whois information starting to propagate, and ZoneEdit picking up that it’s changed and activating the servers

This means that mistakes are hard to detect until its too late, and even if your setup is perfect, there is a window of time where your domain falls between two stools – Whois is updated and directing DNS requests to ZoneEdit, but ZoneEdit hasn’t realised yet and isn’t answering. The only way you can do it reliably is to change one server at a time – ie mix and match between ZoneEdit and your other DNS providers as the change propagates. This makes a switchover more hassle than it really should be though.

So, I’m looking around for another DNS host. I’m willing to pay, in fact for an efficient, SLA-backed service that doesn’t waste my time, I’d be glad to. Free services and peer-organised systems are fine, but really I just want something that works well and doesn’t need hand-holding.

Any suggestions? I’ve heard some good things about DNS Made Easy – they’re more expensive than ZoneEdit, but cheaper than most of the larger commercial offerings and seem to get reasonable reviews. However my enquiry email to their sales department (regarding switchover protocols & timings) has so far gone unanswered, which made me wonder. Anyone got direct experience with them or another similarly priced set-up? I’m looking for something with globally distributed servers, fast updates (preferably full TTL control), and uptime SLA, to manage up to around 10 domains, costing up to $100 a year.