Thoughts on source control systems

I migrated the OGRE CVS repository over to Subversion this weekend, something I’ve resisted in the past due to some problems I’d had when using cvs2svn with our rather old and branch-littered repository. In all honesty, some of the problems were probably self-inflicted since I’d experimented with branch aliases a few years ago which cvs2svn previously didn’t like very much, but luckily the latest version has coped acceptably. For some bizarre reason when imported into Sourceforge the conversion decided to ressurrect a ton of folders & files that had been deleted long ago in CVS, which hadn’t happened when I tested this whole process locally, but all the other files did seem to be correct so some swift purging of the cheeky Lazaruses resolved it.

Subversion’s advantages over CVS are well known – the local diffs, the unified changesets & global revision numbers, nice tools like svnmerge. It’s not perfect though – it’s slower than CVS when checking out and committing, occasionally has a habit of corrupting the local working copy (hence the need for svn clean), and its approach to tags is plain dumb. The Subversion developers clearly liked the fact that they can abstract almost every action into a ‘copy’, but in practice this isn’t nearly as clever as they think it is, and doesn’t actually work well at all for tags. For a start, you should never be able to commit to a tag, and yet you can in Subversion – configuration managers the world over tend to find this to be a breaking of holy scripture. Secondly, because the source of the tag has no knowledge of it (because it’s just the source of a copy), you can’t look through the log of the branch from which the tag was taken and see an easy-to-read tag point there, as you can in CVS.  That actually applies to branches too in fact,  the origin point of a branch is only stored in the branch itself, not in the trunk (or whatever other branch it was spawned from), so from the source of the branch you can’t easily see where it splits off. Yes, you can solve it by relying on the fact that the start of the branch/tag will have a repository-wide revision number, and therefore you can look at the SVN revision log for the trunk / originating branch and infer that because the start of the branch comes between commits X and Y on the trunk, that’s where the branch/tag was taken. Sucks though, it’s much nicer to be explicit about these things, rather than using the ‘copy’ blunt instrument for everything – this is precisely why the ‘Revision Graph’ is so damn slow on TortoiseSVN even for a single file, when it’s very quick on TortoiseCVS. My hope is that they’ve addressed this in Subversion 1.5 along with branch merge tracking.

On balance though, Subversion is generally the most convenient for everyday development tasks so it’s still worth using despite the annoying niggles. Perforce is an even better choice if you can afford it but most won’t choose to (it’s free for open source use, but my guess is most people feel safer from any future changes in that policy on Subversion), and alternatives like Visual SourceSafe and AlienBrain are wholly inadequate in modern times, both because of their chronic inability to work well remotely, and their primitive support for concurrent development practices.

Of course, in the eyes of the open source world, any centralised repository system is strictly a bit ‘old skool’ now, even the popular Subversion. Distributed version control systems (DVCS) are all the rage now, and I can see why, since they most directly represent the open source development methodology. The idea is that there is no central repository (although in practice, there are bound to be those which are considered more authorative), and every developer is capable of synchronising / merging with every other developer in a peer-to-peer, ad-hoc fashion. Tools like Darcs, Git and Mercurial implement this methodology, and in theory, I love the idea. Certainly I can think of several cases where it would have been useful to allow third parties to synchronise their working copies with each other more smoothly & traceably – say, a small ad-hoc subgroup trying out a new technique which may take some time before it’s ready for the central version but people need to collaborate on it, sometimes without the direct involvement of the central team. Also, improved context-aware branching and merging as supported by these tools would definitely be useful to me as in a number of cases, quite large patches can get out of date while lenghty testing goes on, increasing the chances of conflicts when finally applied.

So, in theory I love DVCS – it probably wouldn’t be very natural for traditional corporate teams but for open source, and for some of the distributed commercial teams I’ve worked on, it could be very useful. The problem I see with them right now is that they’re lacking in good toolsets; most are command-line only and some were clearly designed with primarily Linux in mind (e.g. Git, which requires Cygwin to run on Windows). They also tend to be more complex, by nature of the many-to-many synchronisation approaches and high level of flexibility. I think given some more time, they will eventually supercede Subversion in the open source space, and perhaps even some corporate use, but right now I see them very much as I saw Subversion 5 years or more ago – some great ideas but most people will prefer to stick with the mature & well understood solution for now. Once we have a stable and mature TortoiseGit and Eclipse / VS integration, mainstream adoption will follow I’m sure. I would venture that most developers would be loathe to give up their slick Explorer / IDE integration for the kind of flexibility DVCS offers, because in most cases they need the integration more than they need the uber-branching; as useful as that is, it’s probably something that comes up only rarely except in the most popular open source projects.

  • ASpanishGuy

    ohhh, i remenber the old days when i was using RCS…How things have changed!!!

    You are right(and very well informed about VCS, you named the most important) about the next logical step: DVCS, and i hope that TortoiseGit takes over the world. You can now install git without cygwin(there’s a mingw32 version around there), but ti’s TCL/TK gui sucks very much. After reading some articles and playing with git. I can agree with 2 facts:
    -It’s the best VCS in the world
    -Some crazy guy will use git as a filesystem in his linux distro :P.

    PS: svn-git works right, it will make the transition smoothly.

  • Jakub Narebski

    You can use Git without Cygwin, using MinGW/MSys port called msysGit.

    There are a few GUI, both history viewers (gitk is distriuted with git, in Tcl/Tk) and commit tools (git-gui is distributed with git, in Tcl/Tk). There exist ongoing egit/jgit project which aims to add Git support to Eclipse; current GSoC projects includes adding Git support to KDevelop (KDE’s project) and to Anjuta IDE (GNOME’s project).

    There is GitCheetah which aims to be equivalent of TortoiseSVN / TortoiseCVS, but I think it is in beginning stages.

    It will be some time till Git gets mature tools support on the rate of Subversion’s, especially on Windows which is not its primary platform. On the other hand its design is solid, contrary to abuse of ‘copy’ in Subversion.

  • Frenetic

    Yay! Thank you, Steve.

  • SunSailor

    Good point about the tags and branches in subversion, always felt that they have overseen something with their approach… Have you ever posted this special criticism to the developer list?

  • Paul Evans

    I’d not come across Perforce until I started work at my current position, and I’m getting to appreciate what it does now over other systems. Still use subversion for my own personal projects though, I keep a little repro on my usb drive on my keychain 🙂

  • Steve

    ASpanishGuy & Jakub: ah, thanks for the mingw update, didn’t know about that. As I say, give it a few years and the landscape for DVCS will be much richer I’m sure, it’s all a bit hardcore for most developers right now and only those with a specific need (ie particularly large open source projects) or interest will venture there I think. I’m certainly interested enough in it to educate myself on how it works, but I don’t feel confident enough to use it in my mainstream projects yet. Like for many people it took Subversion many years to earn my trust and get to the stage where good tool integration was ubiquitous, it would be unrealistic to expect anything different from the DVCS’s, but I’m sure it will happen.

    @SunSailor: I’m pretty sure this point (visibility of tags/branches from the originating branch) has been made already, at least I’d be amazed if it hasn’t. I know for a fact that a ton of people have (rightly so) lambasted the ability to commit to tags, people have come up with hooks to prevent it but it’s a nasty hack.