Dependencies to build Git from source on [K]Ubuntu 9.04

Development, Linux 2 Comments

Git is picky when it comes to converting large, moderately complex Subversion repositories and so far the only option I’ve found that works reliably is using the very latest version on Linux. Forget about using 1.6.5 on Windows via msysGit, at least for the git-svn conversion it’s very, very unreliable. Similarly I found Git 1.5 on Linux very flaky for the svn conversion. This doesn’t give me the greatest confidence in Git but in order to properly explore all the angles, I’ve committed to making it work even if it means I have to monkey about a bit.

So, I installed a fresh Kubuntu 9.04 (and of course, 9.10 went stable a few days later) and tried to build Git 1.6.5 from source. The configure script is unfortunately a bit rubbish and doesn’t bother trying to detect the dependencies though, so for those that don’t want to go through the fail/retry build loop I went through, here are the packages you’ll want to install via apt from a clean version:

  • build-essential
  • curl
  • libcurl3
  • tk
  • subversion
  • libsvn-perl
  • cpio
  • zlib
  • zlibg1-dev
  • expat
  • perl
  • iconv

If like me you use the excellent wajig wrapper around apt, you can also do ‘wajig build-depend git-svn’, but that seems to install things that are not strictly necessary in the default build (but maybe are needed with some non-standard options).

sudo apt-get install curl
   41  sudo apt-get install libcurl
   49  sudo apt-get install tk8.4
   53  sudo apt-get install cpio expat
   55  sudo apt-get install zlib
   61  sudo apt-get install build-essential
   66  sudo apt-get install zlib1g-dev
   72  sudo apt-get install asciidoc
   75  sudo apt-get install xmlto

S3 encrypted upload script, v2 (Python)

Internet, Linux, Tech 9 Comments

pythonOk, so I discovered a number of shortcomings in my recent attempt to sync a folder in one direction to Amazon S3 using encryption, the most important of which was that it wouldn’t resume a failed transfer efficiently, which in the case of large transfers wasn’t at all ideal (as I learned to be own cost – damn my 256k upload speed).

So, this is attempt number 2. I decided to completely rewrite the script in Python instead to give me some more flexibility, coupled with the availability of Boto, a nice Python library for accessing all the Amazon Web Services. Rather than rely on just local information, or even date/time stamps, I decided to use hashes to track whether files were different. Amazon already stores the MD5 of the file you upload to them and makes that available without downloading the file, but that’s no use when you encrypt your files before uploading them, because the MD5 is of the encrypted contents rather then the original; so unless you keep the encrypted copies around too, or encrypt the local files again every time just to check the match (expensive if you’re dealing with large files) you won’t be able to compare them – I think this is the reason why ’s3cmd sync’ currently doesn’t support encryption.

So, I decided to use S3’s ability to store custom metadata in keys, and stored the MD5 hash of the original file against the encrypted contents that I uploaded. That way, I can check the hashes against each other pretty quickly without having to re-encrypt the local files. If the hashes are different, I encrypt and upload. This approach trades a bit of preprocessing against avoiding uploads, so it’s likely to be more efficient on small groups of very large files rather than lots of small files – that’s how I use S3 for my backups of course. It also means I don’t have to worry about timestamp variations, it’s the content of the file that is the driver of whether it’s uploaded or not.

So, here’s the new version. It’s a bit more powerful than the last one – I’m calling gpg myself now so you have the choice between encrypting using public keys (more secure, and the default), or using symmetric encryption with a passphrase. You need to install Boto before you can run it, and it depends on Python 2.5 with hashlib installed. I’ve run it on both Linux and the Mac, it should work on Windows too provided you take the trouble to set up Python and GnuPG, but I haven’t tried; my Linux (apt) and OS X (macports) setups make these things quicker so being short of time I just went with that. Here’s the usage from –help:

Usage: s3putsecurefolder.py [options] source_folder target_bucket gpg_recipient_or_phrase

Options:
  -h, --help            show this help message and exit
  -n, --dry-run         Do not upload any files, just list actions
  -a ACCESS_KEY, --accesskey=ACCESS_KEY
                        AWS access key to use instead of relying on
                        environment variable AWS_ACCESS_KEY
  -s SECRET_KEY, --secretkey=SECRET_KEY
                        AWS secret key to use instead of relying on
                        environment variable AWS_ACCESS_KEY
  -c, --create          Create bucket if it does not already exist
  -v, --verbose         Verbose output
  -S, --symmetric       Instead of encrypting with a public key, encrypts
                        files using a symmetric cypher and the passphrase
                        given on the command-line.

Once again, no warranty is given, MIT license. If you see that I’ve done anything dumb, let me know :)

s3putsecurefolder

Linux, Open Source, Tech, Web 4 Comments

Edit: this script is deprecated in favour of a rewritten version 2.

I use Amazon S3 to host large media files which I want cheap scalable bandwidth on, and for expandable offsite storage of important backups. I used to have some simple incremental tar scripts to do my offsite backups, but since I moved to Bacula, I’ve just established an alternative schedule and file set definition for my offsite backups, the critical subset of data I couldn’t possibly stand to lose (like company documents). Since I was refreshing all my procedures and tarring the Bacula volumes no longer made any sense, I rewrote my script for putting the resulting backup data on S3.

The prerequisite in all cases is s3cmd, which is pretty mature now and available on most distros (“apt-get install s3cmd” and you’re done on Ubuntu). s3cmd actually has a ’sync’ command, but firstly that tries to sync in both directions, which I don’t want (I know in theory it should never overwrite any local version so long as I don’t update the remote copies from somewhere else, but I’m paranoid when it comes to my backups and prefer to be explicit), and secondly it obviously has to connect to S3 to determine the sync status, wheras I always know whether I need to upload new files just from my local environment (and S3 charges per request – not much, but it’s not zero and it’s the principle of the thing). So, I decided not to use the ’sync’ command, and just determine locally what new files I needed to ‘put’ on the server.

Secondly, encryption is a must, since some of the data is sensitive and I don’t want to trust anyone else with it. I used to manually GPG my tarballs before uploading them, but I noticed that s3cmd supports an encryption option too. It just uses GPG anyway, just in symmetric form rather than asymmetric like my version did (translation: you use the same passphrase for encryption and decryption; a little less secure than using generated public/private keys but still ok so long as you pick a good passphrase and look after it). The default symmetric algorithm in gpg is CAST5 which seems pretty good, although you can change it if you want by editing your s3cmd config file. So, I decided to give it a try – after you configure s3cmd to use encryption, it actually automatically decrypts too when you pull the data back (symmetric key, remember) – being distrustful, I pulled the data back from S3 in a different environment and examined it, and it was indeed complete gibberish, but decipherable with the passphrase. Good stuff.

So, here’s my little script which will upload the encrypted contents of a folder to S3 – just the contents that have been added or updated since the last sync of that folder, and will encrypt them by default. I just run this on a cron schedule now and it seems to work fine. License is MIT, use at your own risk, no warranty is given that it won’t destroy every file on your machine or eat your children. Usage is like this:

s3putsecurefolder /my/source/folder my.s3.bucket

Edit: it was brought to my attention that Amazon have made it easier to create pseudo-folder structures in S3 buckets since I last tried to do it (I swear it used to throw out keys with forward slashes in them, I had to mangle my names last time I did this), so I’ve updated the script to allow nested folders too.

MS breaks the sixth seal?

Open Source, Windows 16 Comments

Quick check – ok, the sun is in fact not as black as sackcloth. But today, something earth-shattering happened – Microsoft has contributed code to Linux.

I’m sure I’m not alone in thinking that I’d never live to see the day this happened. It’s 20,000 lines of driver code to make Linux run better under Hyper-V, which is of course in their interest (since you have to buy a copy of Windows Server 2008 as the host) , but that’s par for the course for open source contribution (you scratch your own itch!), and it’s a massive watershed regardless. From what I hear there’s still a lot of concern at Microsoft about how to manage contributions across the company boundary (in both directions), so I’m not sure what extra procedures they would have put in place for the developers involved in this process to keep the corporate legal army satisfied – perhaps pre- and post-project selective mind-wipes ;) – but the fact that they managed to make it happen is a big deal.

Microsoft has wielded by far the most acrid rhetoric about open source in the past – we all hear that it’s changing, and I know particularly of specific people at Microsoft (mostly developers) who take a much more open view, but it’s hard to escape the feeling that while the top brass who set the ‘old’ policies remain in situ, substantive change will be difficult. But this move is one of many lately that make me think that just maybe, people higher up the chain are starting to get it. Or at least, they’re starting to defer to people who know better.

I’d argue that very few people in the open source community are inherently anti-Microsoft, they’re just a little more free-thinking when it comes to technology choices, a little more honest with their opinions, and have come to view MS as ‘the enemy’ primarily because of the old rhethoric the company used to use on a regular basis to attack them (and some parts of the company still don’t seem to be getting the ‘openness’ memo – as TomTom found to their detriment). Microsoft, or rather, Mr Gates and Mr Ballmer specifically, effectively made themselves the enemy of the open source community with their often ill-conceived tirades, and that’s something that will take a long time to heal. But, as we all know, actions speak louder than words – and if the company continues to make these kinds of conciliatory moves, they will start to win people in the open source community back, at least those people that judge on facts rather than old predjudices.

Trust takes a long time to be earned, particularly from where MS started from, so it’ll be a long road – but if this is how things are going to develop in future, then bon voyage, MS.

UNetbootin is awesome

Linux, Tech 9 Comments

I justĀ  assembled a new server machine, which in the end I chose to house in a shiny aluminium Thermaltake Lanbox, which is relatively compact but still roomy enough for two hard drives, a bog-standard power supply, and plenty of airflow, which is what I wanted. I also knew that the fans on this case were nice and quiet (I have a black steel version as a GPU test box, I wanted a lighter version this time!), which is important for a machine that will be on all the time.

As I said in my previous posts, I was determined not to put an optical drive in this machine. It really doesn’t need one, since all the system software will be downloaded anyway – the only possible use for an optical drive would be to boot the machine in the first instance, and that seemed a total waste. I know DVD drives are cheap, but why clutter up the box with one just for the rare occasion when I need to manually boot? The same goes for floppy drives, which are such dinosaurs I can’t believe some new machines still come with one present – the only possible use for a floppy drive these days is to provide a slot that you don’t mind a toddler feeding jam sandwiches into.

No, instead I wanted to boot from a USB flash drive. I’d never done this before, so I scouted around for the best ways to do it. Syslinux came up pretty quickly as the primary contender, but being lazy I hunted around a bit longer to see if anyone had a simpler way than configuring Syslinux manually. That’s when I came across UNetbootin.

What a fantastic little project! It took literally 5 minutes from downloading, to creating a bootable USB disk with the distribution of my choice on it (UNetbootin will download your chosen distribution automatically – or you can supply an ISO of your choice if you want), to booting up the new machine. I couldn’t believe just how simple it was! I chose to put Ubuntu 8.04 Netinstall on the disk, which clocks in at a tiny 9Mb because it’s just enough to boot up the installer to start downloading the real packages direct, but if you want, and you have a big enough USB stick, you can a complete distro on there too. But this way, I can use a crappy old USB stick I have lying around as my boot device.

A great little tool anyway. I love it when things are easier than you expect.

Moblin looks really interesting

Linux, Open Source, Tech 4 Comments

Not being the kind of person who would buy a netbook, I hadn’t really paid much attention to Moblin, Intel & Novell’s new netbook-targetted, Linux based operating system. However, Matt Asay posted about it today and that got me looking at it, and I have to say I’m very impressed.

I love that they’ve tried to rethink the operating system interface from the ground up rather than just follow in the footsteps of previous efforts. One of the reasons desktop Linux distros have never made much of an impression on me is that I often just feel like I’m using a slightly more technically-oriented facsimile of Windows, which is ok but tends to be more demanding of my time; in comparison the Mac gives me a usability boost and saves me time over using Windows, as well as having a solid back-end, and that’s why I like it. People need a reason to use alternatives to Windows, and just reproducing the user experience isn’t enough. I had assumed that Moblin would be another ‘typical’ desktop Linux in the usual vein, but they’ve done something much more interesting; a completely new interface designed around common usage patterns.

I’m not sure how it would be to use (I’d need to actually play with one, and I don’t intend to buy a netbook any time soon), but it certainly looks good, and I completely applaud their initiative. It’s about time more people experimented with usability like this rather than unquestioningly sticking to boring old operating system interface ’standards’.

I *heart* plain text configuration files

Linux, Tech, Windows 5 Comments

A small bit of musing while I wait for another back-up to run…

Reinstalling a server from scratch sucks. Obviously. Not being able to use direct dumps of the old system itself because of concerns of how far a malicious attack got, and how long ago (even though we’re running SELinux) means that everything has to be constructed afresh. How much fun I’m having.

But if there’s one silver lining here, it’s that at least Linux stores every shred of its configuration in a simple, plain text format, and in one dedicated subtree of the file-system. Even though the server itself had to be taken down, the old disk was mounted so that I could look at previous configuration files easily, and carry across relevant ones (checked manually, natch) directly. It still takes a lot if time, and the fact that I jumped OS versions at the same time has complicated it (but, if there’s any time to do that, it’s now), but it would have been much worse if I couldn’t reference the old system.

One thing I found annoying at times about doing admin on Windows servers (in a past life) was that they generally hid their settings away from you – the common assumption was that you use a GUI to edit everything, and accessing settings without that GUI was frequently difficult, if it was even possible at all. Although a GUI is friendly for many uses, it also does a pretty good job of hiding things from you. Sometimes that’s useful (drawing attention to the most important things), sometimes it’s very unhelpful (trying to find which tab / dialog a particular option is in). One thing it definitely fails at is making it easy to extract / summarise all the necessary information to audit it, or to recreate an entire setup. Even when you have the server still running its a pain, but if that server has been taken offline it’s extremely hard to extract information from it without booting the thing up again (which if it’s damaged or compromised, might not be desirable or possible). Settings were often scattered among the registry, proprietary repositories, application specific places, and sometimes in custom data formats. If you’re lucky you might be able to extract the settings some way, but usually you have to have thought of it before the machine was out of action, and the process is often specific to a given application – so even assuming you remember to do it for all of the different server apps you’re running, the results are disparate, hard to organise and very often not human-readable – something you really want when you’re auditing a machine or creating a variant. The result was, in my experience, relying heavily on binary machine images for reference setups, test servers etc. That works well enough, but it’s a bit opaque and doesn’t help you much when you want variations (unless you have MxN images, or a tree of derivative images).

In comparison on Linux, I know I just have to look at plain text, readable configuration files in /etc/, which I can do on pretty much any device without actually having to have any of the old software running – I can just mount the disk with minimal permissions. By and large the text files are extremely well commented and contain pretty much every option you might need, just commented out when defaulted. I can search quickly for settings just like I can in any text file, and search across the entire configuration if necessary. Being text, it’s very easy to create a standardised configuration template that you can roll out, in a much more configurable way than a raw machine image. The visibility of all the settings certainly helps, and you can do all sorts of nice things such as generating configurations from variable options, should you want to. Text configuration might look less friendly at a surface level, but over time, I’ve found that in practice it’s actually a considerably more productive way of doing things for many admin tasks – especially the more difficult ones.

Tweaking an existing installation is still probably easier with a GUI though, and less intimidating to beginners or occasional admins (and I actually count myself in the latter). The best solution is to have both – a GUI for casual admins and core text config storage underneath – and of course there are plenty of optionsĀ  about to do that on Linux too.

I’m not sure if Windows Server 7 does anything different here – I haven’t administered a Windows server since 2003 so I may well be out of date. I just thought I’d break out some love for the often unappreciated plain text config file. Sometimes simplicity is the best choice.