hgsubversion - dropping old history during conversion (mod)

· by Steve · Read in about 3 min · (610 Words)

mercurialI’ve already posted about my experiences with Git and Mercurial, the end result of which was a vastly increased respect for Git but a basically confirmed preference for Mercurial, based on ease of use, platform consistency and resilience.

Mercurial’s conversion tools are really quite good - the core tools worked fine but I was impressed by hgsubversion’s speed and that it seemed to just work, in both initial conversion and pulling subsequent updates. It was missing a couple of features that I wanted though - firstly the ability to reflect merge points between branches during the conversion, and secondly to be able to ‘squash’ ancient history down to a simple snapshot to save space.

At OGRE, we’d carried forward all our history from CVS to Subversion and as such have almost 8 years of history, including a couple of file reorganisations. Mercurial’s storage efficiency falls down compared to Git when files are moved around, because a file stored in more than one place in the tree over the history of the project is physically stored multiple times too, whilst Git stores the content only once with pointers from the various locations / history points. Most of this overhead could be removed just by eliminating old history we didn’t need anymore - history that does no harm in Subversion since only the server holds it, but does cause unwanted overheads in a DVCS since every user gets the entire repository. Removal of history is something that Mercurial shuns - rightly so in the case of public repositories but in these rare cases it would be nice if there was a tool for removing old history; again Git allows this but it has to be used with care. In the absence of that, doing it at conversion seemed the best way.

I asked about these things in the hgsubversion community, but the tradition of open source is that if you really want something urgently, you know where the code is šŸ˜€Mercurial is really nice when it comes to hacking because it’s all Python; so there’s a nice unified API in one place that you can refer to - that’s one of the reasons I like it over Git which is far more fragmented in technology terms. I’m not a Python guru by any means, but I managed to implement both these features - I did the “mergemap” support a little while ago and added the “skipto” option today - it’s called that because “skipto” was already referred to in the hgsubversion code but it had no implementation.

The result is that the OGRE Mercurial repository with only the last ~3 years of history (back to when the v1.4 branch was created) is now only 74MB, rather than the 206MB of the original, complete conversion (in comparison Git was 116MB for the whole thing). By dropping the history I’ve removed most of the instances of reorganisation which is where most of the space has gone. IĀ  hope eventually that Mercurial adds a utility to deal with stripping ancient history (right now, you can only strip branches) but this solves my primary conversion issue. Since this new repo can be kept in sync in a very lightweight fashion with the existing Subversion repo, I’ll be periodically updating it and doing more tests to reassure myself that the content really is ok.

If you’d like to get my custom version of hgsubversion with these features, it’s here: http://bitbucket.org/sinbad/hgsubversion/. I make no promises that it’s error-free, use at your own risk. It currently assumes that you’re using the standard Subversion layout, are converting from the root of that and have the ‘svn’ command on your path.