DVCS Score Card

· by Steve · Read in about 15 min · (3019 Words)

So, I’ve just about completed my practical experiments & review of Mercurial and Git.

In the end, I had far too many separate notes and sets of experiences to post, so I boiled the argument down into the 10 most important factors to me, and scored Mercurial and Git on a scale of 1-5 based on what I’d found when using them. Here are the (annoying) results:

# Criterion Git Hg
1 Ease of use - command line 4 5
2 Ease of use - GUI 4 4
3 Platform support - core 3 5
4 Platform support - GUI 4 4
5 Web Host Functionality 5 4
6 Reliability & error handling 3 5
7 Storage efficiency 5 3
8 Run-time performance 5 5
9 Flexibility 5 4
10 OGRE Community support 5 4
Totals 43 43

I’ll explain the scores, and my conclusion, after the jump.

1. Ease of use - command line

This criterion boils down to how easy is it to learn to do all the operations required of a typical developer, many of which I’ve listed in a separate thread, how consistent and intuitive those commands are, how natural the defaults are, and how easy it is to screw something up by accident. This is by nature a subjective measure. While Git improved greatly in my perception during the course of the evaluation, there is absolutely no doubt that the command-line of git takes more time to learn to operate effectively than Mercurial does. I deliberately learned Git first to avoid being biased by a command set that was closer to what I was used to, and grew to be happy with most of the Git commands, but it took me longer and I had to refer to the help and Google for things more often. The included help in both is good, but git has a lot more options (hence the win on flexibility later) and lots more discussion on edge cases and how the physical repository implementation is affected, which may be interesting technically but can cloud the issue when you’re trying to learn how to do something. Unusual terminology also confuses people from other systems - “git reset -hard HEAD” is simply not as intuitive as “hg revert -all”  for the migrators. I felt more comfortable with Mercurial’s command-line faster than I did in Git, despite learning Git first. I did, however, feel that I could use Git fairly happily after going through the learning process, and my attitude towards it moderated after some experience - this is why there is only a single point difference between them - if you’re at the start of the learning process Git is probably a 3 rather than a 4.

2. Ease of use - GUI

Linux users typically care less about this than those on Windows and Mac, and since I rarely use Linux on the desktop I’ve concentrated on Windows and Mac here. I tried TortoiseGit, GitX, TortoiseHg and Murky (SmartGit is another but it wasn’t considered stable when I was looking). In practice I found both systems to have shortcomings; on Mac the GUIs are still quite limited in functionality, and on Windows there are some things that work better in TortoiseGit (it feels like other Tortoises so initial impressions are superior to TortoiseHg which is a different layout), but TGit also has some bugs (such as not listing all branches in Switch if there’s more than about 20) and relies on msysGit which is not ideal especially when you need to abort an action and it leaves a git.exe lying around locking files. In practice they’re all about equal in their imperfections, no clear winner. I’m sure they’ll both improve over time but both are usable now, with users needing to drop to the command line for more complex tasks (see point 1).

3. Platform support - core

This is about whether the core toolset runs well across at least Windows, Linux and Mac OS X. This covers not only the ability to run on each of the platforms consistently, but also to exchange data between the platforms without issues and deal with line ending differences etc. Git came of worst here, because it is inherently designed to require a Unix-like environment, built as it is on common Unix tools like the shell and perl. On Windows it requires msysGit which in most cases works (I tested with 1.6.4 and 1.6.5) but is both significantly slower than under Linux - something you only notice for large operations but it’s up to a factor of about 6 in my tests - and still has bugs. git-svn for example is completely unusable on Windows in my experience against non-trivial repositories - the Ogre repository killed it consistently. Also the core Git developers openly admit that they don’t care much about Windows support, so commitment to the platform isn’t there. Mercurial on the other hand runs on Python and in my tests operated identically on all platforms, and is officially supported wherever Python runs. Both systems handled automatically converting line endings and did a better job than Subversion (which requires properties per file).

4. Platform support - GUI

So does either system support the platform range better than the other via the GUI? Not really - as mentioned in the ease of use section they’re both pretty good but still a little flawed in places. On Windows you still require msysGit even with a GUI which makes Git a little slower than on other platforms, but TortoiseGit is surprisingly good considering the vibe of general disinterest in Windows around Git. GitX and Murky are competent if limited on Mac OS X. Both systems have a core Tcl/Tk interface if you really need it, but honestly anyone in the slightest bit sensitive to nice GUI design won’t want to touch them with a 20 foot barge pole, they’re ugly - Tcl/Tk makes Windows 3.0 look shiny in comparison. No clear winner here.

5. Web Host Functionality

A few online services have sprung up to support the DVCS workflows better than simple hosting on Sourceforge or Google Code. Everyone seems to talk about GitHub all the time, but while it’s very pretty, functionally it’s soundly eclipsed by Gitorious which has considerably more robust support for dealing with merge / pull requests, handling them much like a patch tracker but with repository-aware, multi-revision URLs and inline reviewing instead of patch files. They also allow integration of contributor license agreements. GitHub in contrast only allows fire-and-forget pull requests or generalised browsing of commits people have made to other forks, which is not as useful.

BitBucket (for Mercurial) is functionally pretty much the same as GitHub (including the fire-and-forget merge/pull requests and general fork lists), except that it’s not quite as flashy. If we were comparing GitHub and BitBucket, the scores would be the same since aesthetics are not very important in the grand scheme of things, but Gitorious ups the ante, which is why Git wins in this category. I’ve actually talked to the guys at BitBucket, who are extremely friendly and eager to help, and Gitorious-style merge requests are on their TODO lists. They even offered to bump it up their priority lists if it was important to us. Very helpful chaps, but Gitorious still has to win based on the current status.

As an aside, it’s also worth noting that Launchpad is also extremely good in the merge request tracking area, on par with Gitorious. I dropped Bazaar from my evaluation due to lack of time and because it’s the least popular of the 3 in our community by a massive margin (and also, I don’t like the branch containment model they use very much), but Holger played with it and it’s clear that Launchpad deserves a mention for nailing this feature set very well. GitHub may be the poster child, but functionally others deserve more attention.

6. Reliability and Error Handling

As a wise man once said, sh*t happens. How easy a system is to break, and how it behaves when things go wrong is just as important as how well it works under normal operating circumstances, especially for something as critical as a source control system. I didn’t specifically go out of my way to cause problems, but during my many use cases I did encounter some sticking points, which was precisely the point.

Q. How often did problems occur? A. On Mercurial, I never had a crash, on any platform, that I didn’t accidentally cause myself. The two crash incidents I had were during conversion from Subversion, and were caused by firstly an rsync kicking in and changing the source Subversion file system under the feet of the conversion process, and secondly when I killed the process manually because I wanted to interrupt it. So in essence, I have yet to see Mercurial fail unless I break it. On Git, I normal operations behaved fine but during conversion from Subversion I had many problems. Git 1.6.4 and 1.6.5 on Windows regularly crashed mid-conversion, as did Git 1.5 on Linux. Git 1.6.5 on Linux behaved better, but only if you upgraded the (admittedly old) Subversion 1.3 repository to 1.5 or 1.6 first. Mercurial on the other hand seemed to cope with any combination I threw at it, on any platform.

Q. How good was the error reporting, and how easy was it to recover? On Mercurial, when I did get a crash (self-inflicted) I got a full Python stack trace with an exception message which was consistently useful, allowing me to quickly rectify the issue. The repository was also valid even in those cases. On Git, all the crashes I had on Windows and Linux simply resulted in the process terminating with no message. I only managed to figure out how to resolve the problems through trial and error, Git was absolutely no help. The repository left behind after these crashes was corrupt.

So, my personal experience was that Mercurial was very robust, and in the rare case of a problem it reported it well. Git was ok most of the time, but some operations were fragile and for example only a very specific version & platform worked for converting the OGRE repository. When Git did fail, my experience was that it didn’t report any useful errors and it basically left you high & dry, scrabbling on the net for answers. Mercurial wins this one outright based on my experiences.

7. Storage Efficiency

A simple one to measure - after converting the 375MB OGRE repository to both systems, and before any custom pruning, Mercurial was about 200Mb and Git about 180Mb. A manual pruning operation by community member guyver6 brought the Git repository down to 116Mb; after pruning out branches in Mercurial I only managed to remove about 7Mb. It appears that the primary reason for that is that moved binaries end up getting stored twice in Mercurial, while Git only stores them once once the data has been packed. Mercurial always packs its data as you operate on the repository, while Git lets its storage get sub-optimal in size while you’re working on it in order to give you maximum run-time performance, and ‘git gc’ needs to be run every so often (some commands do this automatically) to re-pack the data for storage efficiency; which is best depends on your point of view, whether you prefer a uniform behaviour or a split behaviour. But overall, Git wins here. In OGRE we actually have a number of moved binaries in our history which Mercurial clearly does not store as efficiently as Git does.

8. Run-time performance

This was a bit of a mixed bag. I found that Git on Windows was a poor performer on local batch operations that were not constrained by the network, compared to Mercurial. On Linux or OS X performing local operations, performance was practically indistinguishable between the two. Bulk operations that did require network access were a little faster using Git, but not by much. When it came down to everyday operations, the slightly slower msysGit for local operations, and the slightly slower network performance of Mercurial, were barely perceptible. A wash, both systems are fine.

9. Flexibility

When you need to do unusual activity X, can you do it? In Git, the answer is almost always ‘yes’ - it has an enormous number of commands and options and doesn’t really stop you from doing anything, even if it’s a bad idea. Mercurial on the other hand defaults to being quite strict, but there are a number of extensions, both official and unofficial, that can bring the functionality fairly close to Git, but not all the way. A few examples:

Local branches: this is a very useful feature of Git, where you can create lightweight branches in your local repository that you can use for experiments or patch processing without having them become a permanent part of the upstream history. Mercurial branches are all permanent by default, unless you use the LocalBranch extension which is not official. You can replicate the behaviour to a degree with Queues, which is official, but it’s more complicated. Git is better here.

History Modification: changing history is a very bad idea if you’re upstream of anyone else, but in a local private repository it can sometimes be useful. Git provides features such as rebase -interactive in which you can squash together and reorder commits to reorganise them before upstream submission, and filter-branch to make wholesale changes to the history, for example post-conversion to simplify the repository. Mercurial has basic support for some history modification (MQ again, and unofficial extensions like histedit), but they are not as flexible. Most of the time this isn’t an issue, but occasionally it can be limiting - for example I have not so far found a way to remove history before a certain date (or collapse revisions together before a certain date) - the unofficial histedit and collapse extensions do not work for this and MQ won’t let me import regions for qfolding that exist before branches are taken (which is what I want to do - I need my more recent branch history, I don’t need the old stuff). I don’t understand this restriction, I’ve already stripped all the early branches so the early history is entirely linear, why should it care that there’s a branch taken later on?

So, Git wins here. In everyday use you won’t care about this, which is why there’s only a single point between the scores when otherwise Git might deserve a 2-point lead here, but certainly when you’re doing uncommon things Mercurial puts more barriers in your way. In day-to-day operations that’s probably a good thing, since it encourages you not to do stupid things. But when you have a specific need to do something for a very good but rare reason, it’s annoying when you can’t.

10. OGRE Community Support

We ran a survey on this to see what people were using already. Git tends to get a lot of fans talking about it, but I’m also very aware that evangelists aren’t usually the best people to listen to. By asking people what they used practically, rather than which one they thought they might like to use, I hoped to tease out usage numbers. Of course, popularity is no guaranteed measure of quality, but it’s a reasonable indication of how each system might be received by our community. As it turned out, and not unexpectedly, most people had only seriously used one of the DVCSs and liked the one they were using, but had no real view on any of the others.

The sample wasn’t huge - only 64 people voted (but pleasingly a power of 2!) - and the numbers were as follows: Git 52% Mercurial 41% (Bazaar 8%). This nicely translates objectively into a score!

Conclusion

Well, this is annoying. My 10 criteria actually resulted in equal scores - trust me, I didn’t fake this; I thought very hard about these scores because I found myself being indecisive between the two systems because they both had positive and negative aspects, and I figured the only way to resolve the overall result was to try to score them and let math solve the problem. So much for that idea - it seems when I set them out numerically they are as balanced as they were more abstractly in my head.

So in the end, it comes down to the relative importance of these 10 items. I tried to pick 10 things that were of roughly equal importance to avoid skewing anything, but if really pushed I’d have to say that consistency across platforms and confidence in the reliability and error reporting has to be more important to me personally than most of the other factors. The one exception is the storage size - I want people to clone the source repository so they are encouraged to get involved in development, and I’m aware that the larger it is, the more that’s a disincentive. 200MB is pushing it a bit, and it’ll only get larger - and according to Mercurial’s specs that’s already compressed. In comparison, right now according to my measurements someone grabbing our Subversion repository has to transfer about 48MB of data (compressed) per branch over the network, so Git’s 116Mb (again, compressed) is looking very attractive compared to Mercurial’s heft.

I think that if I can find a way to reduce the size of the Mercurial repository to around 100MB, perhaps by stripping the old trunk history somehow (stripping old branches doesn’t appear to have made a great deal of difference), but while still keeping branches after this point, I’d go with Mercurial just because on balance it behaved more consistently for me. If I can’t, I’m still annoyingly on the fence becuase 200Mb feels too big, but I can’t afford to trash all my history or branches. There is lots of talk in the Mercurial wiki about shallow clones and potential history trimming extensions, but nothing seems solid right now. Anyone have any suggestions?