Adventures in conversionland

Development, OGRE 17 Comments

As  you know I’ve been reviewing DVCSs lately. I’m taking my time doing real use cases on them, and deliberately not doing the sort of feet-first leap into whatever looks best / most popular on the surface because I don’t particularly want to discover unexpected problems down the track. It’s consuming a lot more time than I expected – I’m writing up my findings and may publish the entire results later on if I can find the time to clean them up and format them better, but for the moment I thought I’d share some experiences with the conversion process of a relatively large, long-lived, multi-branch repository (OGRE) from Subversion to Git and Mercurial, because that’s what I’ve been wrestling with in the last few days. I discovered a bunch of additional issues during this process that did not occur when starting from scratch or doing conversions from more trivial repositories, so I thought it might help others to talk about it.

Source Subversion repositiory specifications

Revisions: 9215 (as of today)
Branches:  9 permanent, 22 temporary / experimental
Size: 375 MB

Also of note is that the source repository is still at Subversion 1.3 – this is because Sourceforge was stuck on this version for a long time and we haven’t upgraded the repository since they started supporting newer versions. We never bothered because it requires locking out the repository while you download the whole thing to a local machine, upgrade it and re-upload it, which is a hassle, especially when you have things to do. In practice the server-side version hasn’t been a major issue since you can still use newer clients with it and svnmerge operates regardless.

General Approach

I rsync the OGRE repository down to a local Linux server several times a week, so that was the source of all my conversions, eliminating most of the network time. I tried to convert the repositories using Windows clients in the first instance, because that was easier to use the latest versions of the tools (my Linux Server is on Ubuntu 8.04 LTS and even with hardy-backports available it’s not as up to date – and for simplicity because this is an important server I stick to the official versions). There is a 1Gb network connection between the machines so it could be pretty speedy.

The principle is that I want to preserve all history, all branches, and all tags. In practice I may actually prune off some branches later on, so that the clone process is quicker, but the base principle is that it should be a lossless conversion in the first instance. Definitely no top-skimming of the trunk like some conversion articles advocate – we have stable branches that must be maintained and regularly have work that we want to keep in experimental branches. In particular, post conversion it must be possible to continue committing to and merging from stable branches.

Git Conversion Experience

I’d previously converted some other, small and fairly simple Subversion repositories using git-svn (less than 500 revisions, and 2-3 branches) and it worked fine. However, when trying it against the considerably more complex OGRE repository I hit problems very quickly. On Windows, using msysGit 1.6.4 the process failed after 1900 revisions, just after doing the automatic repository tidy (git gc). The error message was simply ‘fatal error running git-svn’, even though it had been running exactly that command for the last 1900 revisions. Thinking there might be an msysGit issue here, I switched to the Linux server (git 1.5.4) and tried the same thing. This time it fell over at revision 176 with absolutely no error message. In both cases the repository left behind was corrupt so I could not resume the process.

The other thing I noticed was how long the process took on Windows. 1900 revisions took 5 hours (!) and thus I wasn’t in a hurry to retry the process there. On Linux the process was much faster, as far as it got. It’s worth noting that this is not caused by running across 2 machines – not only do I have a very adequate 1Gb link, Mercurial managed significantly faster conversions using the same topology. msysGit’s git-svn conversion is simply incredibly slow.

At this point I decided to try upgrading the Subversion repository, just in case git-svn hadn’t been tested with older repository versions. My Linux server had svn 1.5 on it, so I upgraded the OGRE repository to that locally and re-ran the git-svn process on the Linux machine (as I say, I wasn’t keen on repeating the glacially slow msysGit conversion). Sure enough, this time all 9200-odd revisions converted fine, in only about 1 hour 40 minutes, or about 15 times faster than doing it on Windows.

So, I may have had a few problems, and being forced to upgrade the repository before converting was a bit of a pain, but at least it worked and was fast (on Linux anyway). After that, I started cloning the repository both on Linux and Windows and tried performing some standard operations.

The first thing that surprised me was that when cloning the converted repository, I could only see the ‘master’ branch on the remote machine. It’s common practice for Git not to create any local branches other than master on clone, but usually you can do ‘git branch -a’ to see all the remote branches that are available, which show up as something like ‘origin/v1-6′ – you can then check them out to local branches. However, no branches other than ‘origin/master’ showed up, even though I knew they’d been converted. It turns out that git-svn converts all branches except master into remote branches in the converted repository, referencing the original Subversion URL – so very much like having cloned from another Git repository. That sort of makes sense, but in the context of a full conversion to a repository that is destined to become the upstream master, isn’t that useful. In practice what you need to do is after the git-svn conversion is complete, git checkout each of the branches that you care about in your converted repository, thus creating local branches in that repository which subsequent cloners will be able to checkout themselves.

So, once I’d figured this out I started to check out different branches to test if it had worked. At first it seemed to, when checking out the first branch (switching from master to v1-6 in a local clone from the conversion). When I came to try to switch back to master however, Git complained that I had modified files in my working directory. WTF? I’d only just checked out the clean copy of the v1-6 branch. But sure enough, git status told me I had 5 modified files. Diffing them showed no changes, and “git reset –hard” returned with no error, but git status still showed these files as modified. Bizarre. A git checkout -f still let me switch, but again after completion a set of other files showed up as modified. Switching back and forth (with -f) a few times revealed that the list of modified files after checkout was different each time. Again worried that this was a Windows thing, I tried checking out on my Linux machine instead (so at that stage the entire process, conversion to checkout, was done on Linux). But no, the same problem occurred – a random selection of 5-7 modified files on clean checkout.

This has raised some serious concerns about using Git for me. Firstly the flaky conversion which requires a bunch of extra steps just to get it to work at all, then the post-conversion bizarre behaviour of thinking files are modified when they’re not. I had none of these problems with smaller repositories, created from scratch or converted, which up until now I’d been using for testing (and Git had been winning me over in fact since it had been working well). But the bottom line is that this process needs to work reliably for the OGRE repository. If it doesn’t, it’s pretty much untenable.

Mercurial Conversion Experience

I started off with the in-built ‘hg convert’ process. It all went smoothly and took about 8 hours, and the resulting repository was mostly fine. However, the default behaviour is to process the revisions in an order which “produces the fewest jumps between branches in the commit log”. In practice, I found that this meant the revision log when reviewing multiple branches was badly jumbled and difficult to use; the use of the ‘–datesort’ option resolved this but increased the conversion time to just under 10 hours (still faster than msysGit but a lot slower than git on Linux).

The guys from BitBucket, who I’d talked to to see if they would offer free unlimited hosting for OGRE since we wouldn’t fit in the default 150MB limit (result was that they were super-friendly and offered not only that but lots of advice), suggested that I try hgsubversion instead. I was initially put off by their website suggesting it wasn’t fit for production use (they’ve removed this statement now), but BitBucket told me that was a little out of date, and in fact the Python project is using it for their conversion, which is obviously of major size. So, I gave it a shot and got some good help from the hgsubversion guys, and the results were great – 1hour 40 minutes from the Windows end (coincidentally the same speed Git managed on the local Linux machine), and the log view was properly ordered right off the bat.

The one remaining issue I had (and this is true of git-svn too) is that all of the branches are open-ended on conversion – that is, no record is made of merges that have been done between branches. That means you would have problems continuing a branch and then merging it, because Mercurial would think it has to merge everything from the point the branch was taken. Neither svnmerge or svn:merge properties are taken into account.

One way to resolve this is to manually create a merge point to close off the branches. The easiest way to do this is:

  • Grab the default tip
  • Open a command line and define a temporary environment variable “HGMERGE=internal:local”. This means that you want to keep the local files and throw away the other source when doing a merge, which is important for our dummy merge
  • hg merge <source_branch> -y
  • commit – only the .hgtags file should be modified, the rest of the commit is merely metadata alteration to close off the source_branch

Once you’ve done that, your branch is joined back to the trunk and you can carry on as before, any new commits to that branch will merge across cleanly. The only downside of this is that the merge is strictly at the wrong point – if you view the history in the trunk it won’t be technically accurate and you’ll need to use your commit messages as the real guide to the actual merges before the conversion.

A better way to do this would be to record the merges during the conversion, that is for merge commits in Subversion to have 2 parents. So far, none of the conversion tools read svnmerge or svn:merge metadata to implement this, but the standard ‘hg convert’ has an option called ‘–splicemap’ where you can specify merge points to be applied during the conversion. Unfortunately I’ve tried to use this twice so far, and both times it hasn’t worked (just silently done nothing). The documentation for –splicemap is not great so it could be I got the URLs wrong. But anyhow, following 2 failed attempts (20 hours! because this was the standard hg convert with –datesort) I decided I’d try to get a similar bit of functionality working in hgsubversion instead, since that’s much faster (1hr 40m a pop). Right now I’m hacking away on it to try to make this work, so far it’s not but I’ll let you know if I eventually succeed. One of the benefits of Mercurial is that it’s all in Python so it’s very easy to modify, compared to Git which runs all kinds of random scripts and executables, including sh and perl so it’s much more tangled to dig into.

Conclusions, so far

I started my DVCS evaluation very pro-Mercurial and very anti-Git. While working through my detailed use cases, a process which I’ve not quite completed yet, Git has grown on me a great deal, and I discovered a few things about Mercurial which I found a bit limiting at first, but which are mitigated via extensions – Rebase, Queues and Transplant particularly. My recent experience with more complicated, full-scale and imported repositories has once again gone in Mercurial’s favour though, and I saw a nastier side of Git – when it goes wrong, it’s a lot more difficult to figure out why. In contrast when I’ve had my Mercurial conversion crash – and I stress this only happened due to my own screw up, once because it ran overnight when my rsync kicked in and changed the repository under its feet, and a few times when I’ve been experimenting with hacking the Python to get the merges done – the reason has always been clear; a nice Python trace, and the repository was always intact anyhow – in the case of the core hg convert the conversion even restarted from where it left off once I’d fixed it.

If I were to graph my relative opinion of the two over the period I’ve been doing this so far, it would look something like this:

gitmercurialopinion

Git totally came up from behind and I was really starting to dig it, until it started freaking out on me with the conversion and I started to try to diagnose why and found it mostly unhelpful. Again I stress I’m not done with my tests yet, but I’m perhaps 75% of the way through now and the conversion problems I’ve had with Git in the last few days don’t look good. Bazaar, I’m afraid, is no longer likely to be part of the evaluation – it takes a long time to do these evaluations properly rather than just trivially, and our survey has indicated that it is the least commonly used among our community by a very large margin, so I’m focussing on the ones more users are likely to already be comfortable with.

The evaluation process continues…

Share this post: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • N4G
  • StumbleUpon
  • del.icio.us
  • Mixx
  • Google
  • blogmarks
  • Slashdot
  • Reddit

17 Responses to “Adventures in conversionland”

  1. Andrew Fenn Says:
    October 26th, 2009 at 1:32 am

    Nice write up. A real bummer that you didn’t try converting with bzr. I might have to do some svn conversion in the future and it would be nice to see how it measures up to hg and git. I understand why though as your codebase is huge!

  2. blankthemuffin Says:
    October 26th, 2009 at 2:07 am

    So I thought I’d have a quick go turning the Ogre3D svn repository into a git repo. Looking at the github guide, I decided to use http://github.com/nirvdrum/svn2git which seems to work with git-svn in order to make a full import much simpler.

    It went off without a hitch, I ignored the authors stuff since I didn’t really want to deal with that, but after a few disconnects on my end ( damn satellite ), my server had created what is as far as I can tell, a perfectly working copy, branches in-tact, of your svn repository.

    I also don’t see any of your issues with switching between branches, I assume this is because svn2git handled all the problems you initially described.

    It took about an 2 hours and the resultant repository is 145MB.

  3. blankthemuffin Says:
    October 26th, 2009 at 2:58 am

    Actually I lied, that size is the repository with attached working copy. The repository on server is only 59MB.

    Also provided the daemon doesn’t stop running,
    $ git clone git://sliceofmuffin.com/ogre3d.git
    Should let you pull it down.

  4. Steve Says:
    October 26th, 2009 at 8:56 am

    @blankthemuffin: thanks I’ll try svn2git then. I need to prove I can do it myself but thanks for the clone source. I’m a little dubious about the 59MB size, git-svn created a 122MB bare repository (albeit one with weird checkout problems), but we’ll see.

    Even assuming this works, this process certainly shed light on the relative behaviour of fail states between the two DVCSs. Having had both of them fail (and as I say, the Mercurial failures were my fault) it’s interesting to compare how easy it was to diagnose the issues between the two – and Git came off worse, being considerably more difficult to investigate.

    It also showed how much slower msysGit is than git on Linux; whether you’d notice that in normal everyday use is debateable, but it shows that when doing bulk maintenance operations (or perhaps even large normal operations?) Windows is not the place to be doing it if you use Git. msysGit was slower by up to a factor of 9x (some of this will be attributable to the local git repo instead of across the 1Gb network, but as I said above hgsubversion operated across the same network and completed in the same time git-svn did locally, so I am unwilling to let msysGit use that as an excuse – it’s just slow). This is a factor, because my wish to get off Subversion on Sourceforge is driven by the slowness of bulk operations, so I do not want to recreate this situation just because my normal day-to-day platform is Windows.

  5. Steve Says:
    October 26th, 2009 at 10:00 am

    Well, svn2git was a bit of a pain on Ubuntu 8.04 LTS even with hardy-backports. You can’t retrieve it from gemcutter.org because the version of gem supported in LTS is not compatible with gemcutter (403 errors all the time). Allegedly the only option is to upgrade gem, but gem upgrade –system is disabled on Debian / Ubuntu to maintain compatible versions. I’m not going to monkey with that – this server is deliberately kept on stable core packages.

    So I installed svn2git from source in the end, which required me to learn how to build gems (not that hard). Then I found my Ruby wasn’t set up to use gems anyway, and had to “export RUBYOPT=rubygems” before it would pick up svn2git. So now I can finally try running it.

    This is why I prefer Python. It’s been around for ages and just kinda works even if you’re not on the bleeding edge ;)

  6. blankthemuffin Says:
    October 26th, 2009 at 10:34 am

    I was surprised at the size as well, it might have missed something, but I don’t think so. I did run git-gc, which might have done quite a bit after the initial import, but I just checked and it’s certainly 59MB.

    My server is running Debian Lenny, and I run Ubuntu Karmic here, so I’ve got a relatively new setup. I suppose I am biased from running bleeding edge a lot of the time.

    I think it’s terrible that the git front end isn’t built on top of a pretty, portable, c library. If such a thing existed we could port to different platforms more simply, write wrappers for other languages easily, instead of silly total re-implementations like GitSharp.

    Python is pretty cool. Ruby is pretty cool also, but mostly weird. :D

    On a completely unrelated note, I know you’ve been converting to CMake recently, and I was wondering if you know of any particularly useful ( read: succinct guides rather than incredibly verbose reference manuals ) documentation for it?

  7. Steve Says:
    October 26th, 2009 at 10:59 am

    Not much success with svn2git so far. Result from my local conversion is:

    Found possible branch point: https://localhost/ogresvnbackup/branches/avendor => https://localhost/ogresvnbackup/tags/arelease, 4
    Found branch parent: (tags/arelease) d6fe815965a1035e894d884ee2e0d26fe3af6cc2
    Following parent with do_switch
    Successfully followed parent
    warning: You appear to be on a branch yet to be born.
    warning: Forcing checkout of avendor.
    Note: moving to “avendor” which isn’t a local branch
    If you want to create a new branch from this checkout, you may do so
    (now or later) by using -b with the checkout command again. Example:
    git checkout -b
    HEAD is now at d6fe815… no message
    Switched to a new branch “avendor”
    Note: moving to “trunk” which isn’t a local branch
    If you want to create a new branch from this checkout, you may do so
    (now or later) by using -b with the checkout command again. Example:
    git checkout -b

    HEAD is now at bff7010… Removed generated docs
    error: branch ‘master’ not found.
    Switched to a new branch “master”
    Counting objects: 3713, done.
    Compressing objects: 100% (3642/3642), done.
    Writing objects: 100% (3713/3713), done.
    Total 3713 (delta 2403), reused 0 (delta 0)

    And the Git repository seems to contain only code from 2002! :/ This is with the latest code of svn2git direct from their repo. Looks like it got stuck on the ‘avendor’ branch or something, which is a CVS-ism which came across with the Subversion conversion. Very odd, and again not exactly building my confidence here. Maybe it’s a bug in the latest svn2git and I should try grabbing an earlier tagged revision.

    Regarding CMake, cabalistic did all the bootstrapping, and I learned from reading that and referring to the CMake manuals when needed. And occasionally searching the mailing list. So I’d suggest starting from a working system like ours and experimenting.

  8. blankthemuffin Says:
    October 26th, 2009 at 11:22 am

    That’s a pain. Damn old versions.

  9. Steve Says:
    October 26th, 2009 at 11:27 am

    I reverted to svn2git’s 1.3.1 tag rather than the head and ran it again, it did more this time but still only converted about a third of the branches and left me with a master which was blatantly not the trunk – looks about 6 years old. So better, but definitely no cigar.

    I’m afraid if git 1.5.4 and the latest svn2git is considered too old, then something is very wrong with these tools.

    Last try will be doing it on the 1.5 converted svn.

  10. Steve Says:
    October 26th, 2009 at 12:35 pm

    svn2git on the converted 1.5 repository did better again, but only converted half of the branches and stopped at 2006. I’m guessing that git-svn is crashing or something – there are no errors but it definitely looks like it fell over. If I run the process again it seems to try to restart, but just goes through the branches it’s already converted, says they already exist, and doesn’t do any more.

    The bare repo at this stage is 51MB fully compressed, and since that’s missing a ton of data it suggests to me that your conversion didn’t finish properly either if it was only 59MB. My repo is missing any socXX- branches after soc-06 (there are 3 more years), and only goes up to v1-2 stable, missing v1-4 and v1-6, and master is definitely not the current trunk. I suspect yours is missing a fair chunk of data too – I tried to pull it to verify this but it wasn’t responding.

    So, svn2git is definitely out, it doesn’t seem reliable.

  11. Owen S Says:
    October 26th, 2009 at 8:54 pm

    I think one of the most interesting conversions is going to be KDE going SVN -> Git. I know Gnome have already done so; I don’t know if any information on how it went for them has been published online.

  12. blankthemuffin Says:
    October 26th, 2009 at 9:01 pm

    Ah yes, it looks like that indeed. I didn’t really look at it before but the last commit to master is shown as Fri Aug 3 00:59:10 2007.

    I have v1-4 and soc07, but that’s not really a useful improvement at all.

    No errors doesn’t seem to equal no problems. That’s a serious let down.

  13. John Burton Says:
    October 27th, 2009 at 1:11 pm

    Is it important to convert the history? Can’t you keep old versions in the old system for reference and start a new repositoy moving forward?

  14. Steve Says:
    October 27th, 2009 at 2:50 pm

    Yes, it’s important. Many users use previous stable branches in production code, and when diagnosing issues it’s imperitive to be able to bisect the history. There are also experimental branches we may want to revist later. Starting from scratch would be a major loss in terms of subsequent bugfixing on previous stables and merging of previously experimental code and is not one I’m willing to accept – I’m investigating moving because I want to streamline opereations, not make them more complex by splitting activities across 2 very different repositories. Maybe young projects with no real production users to speak of and a simplistic branch history can afford to do that, but we can’t. Certainly the last 2-3 years history is essential – we could live without history before that but no DVCS allows this kind of reverse pruning because of the way the delta information is stored.

  15. Steve Says:
    October 28th, 2009 at 1:04 pm

    Update: Kubuntu 9.04 with Git 1.6.0 converts our repository successfully using solely git-svn (not svn2git). This is so far the only option that works – 1.6.x on Windows crashes and 1.5.x on Linux produces unreliable results.

  16. Owen S Says:
    October 28th, 2009 at 4:46 pm

    Just looked it up – KDEs supposed to be migrating this month. Perhaps 1.6 incorporates changes to git-svn and co made for their migration?

    The KDE repos are huge – I’d imagine that the problems should all be fixed by the time KDE moves over.

  17. WhiteKnight Says:
    November 4th, 2009 at 1:54 pm

    Thank you for more information about your DVCS tests. Its interesting to hear how they compare when working with a large repository. I’ve just been messing around with a new Mercurial repository, so eager to more about your tests and really appreciate that you take the time to let us know how its going.

Leave a Reply