hgsubversion – dropping old history during conversion (mod)
November 24th, 2009Development, OGRE, Open Source 4 Comments
I’ve already posted about my experiences with Git and Mercurial, the end result of which was a vastly increased respect for Git but a basically confirmed preference for Mercurial, based on ease of use, platform consistency and resilience.
Mercurial’s conversion tools are really quite good – the core tools worked fine but I was impressed by hgsubversion’s speed and that it seemed to just work, in both initial conversion and pulling subsequent updates. It was missing a couple of features that I wanted though – firstly the ability to reflect merge points between branches during the conversion, and secondly to be able to ’squash’ ancient history down to a simple snapshot to save space.
At OGRE, we’d carried forward all our history from CVS to Subversion and as such have almost 8 years of history, including a couple of file reorganisations. Mercurial’s storage efficiency falls down compared to Git when files are moved around, because a file stored in more than one place in the tree over the history of the project is physically stored multiple times too, whilst Git stores the content only once with pointers from the various locations / history points. Most of this overhead could be removed just by eliminating old history we didn’t need anymore – history that does no harm in Subversion since only the server holds it, but does cause unwanted overheads in a DVCS since every user gets the entire repository. Removal of history is something that Mercurial shuns – rightly so in the case of public repositories but in these rare cases it would be nice if there was a tool for removing old history; again Git allows this but it has to be used with care. In the absence of that, doing it at conversion seemed the best way.
I asked about these things in the hgsubversion community, but the tradition of open source is that if you really want something urgently, you know where the code is
Mercurial is really nice when it comes to hacking because it’s all Python; so there’s a nice unified API in one place that you can refer to – that’s one of the reasons I like it over Git which is far more fragmented in technology terms. I’m not a Python guru by any means, but I managed to implement both these features – I did the “mergemap” support a little while ago and added the “skipto” option today – it’s called that because “skipto” was already referred to in the hgsubversion code but it had no implementation.
The result is that the OGRE Mercurial repository with only the last ~3 years of history (back to when the v1.4 branch was created) is now only 74MB, rather than the 206MB of the original, complete conversion (in comparison Git was 116MB for the whole thing). By dropping the history I’ve removed most of the instances of reorganisation which is where most of the space has gone. I hope eventually that Mercurial adds a utility to deal with stripping ancient history (right now, you can only strip branches) but this solves my primary conversion issue. Since this new repo can be kept in sync in a very lightweight fashion with the existing Subversion repo, I’ll be periodically updating it and doing more tests to reassure myself that the content really is ok.
If you’d like to get my custom version of hgsubversion with these features, it’s here: http://bitbucket.org/sinbad/hgsubversion/. I make no promises that it’s error-free, use at your own risk. It currently assumes that you’re using the standard Subversion layout, are converting from the root of that and have the ’svn’ command on your path.









November 25th, 2009 at 5:37 pm
Thanks – this is something I’m looking into too. Wonder if Mercurial will ever let you clone only part of a repository – say a specific branch.
November 26th, 2009 at 11:04 am
A word of warning, we discovered a couple of weird errors in a couple of files since conversion, I have to figure out why. Seems like one of the deltas got applied in an incorrect way somehow; since this didn’t happen when I converted the whole repository I assume it’s related to my skipto implementation, I’m just struggling to see why right now (the messed up delta was about 7 revisions after the skipto starting point – if anything I’d expect it to be a problem on the first revision after the forced starting point).
August 19th, 2010 at 3:28 pm
Thanks for this! I have been trying to learn how to reduce our company repo down from 10GB by discarding ancient history (we have a lot of big binary files). Turns out the current hgsubversion (a8d5eec1326b) is capable of cloning a *single directory* SVN repo (or just your trunk directory, for example) and discarding old revisions.
Just run “hg help clone“ after installing the hgsubversion extension. There’s an option: “–startrev VALUE convert Subversion revisions starting at the one specified, either an integer revision or HEAD; HEAD causes only the latest revision to be pulled”
Seems to be working fine for me thus far!
August 19th, 2010 at 3:32 pm
You’re welcome!
I don’t know if they adapted my patch or re-implemented it from scratch, I know my version wasn’t 100% reliable for all situations but I only had time to made it work well enough for us. Glad to hear they have a more general version available now!