Thoughts on source control systems

· by Steve · Read in about 5 min · (1020 Words)

I migrated the OGRE CVS repository over to Subversion this weekend, something I’ve resisted in the past due to some problems I’d had when using cvs2svn with our rather old and branch-littered repository. In all honesty, some of the problems were probably self-inflicted since I’d experimented with branch aliases a few years ago which cvs2svn previously didn’t like very much, but luckily the latest version has coped acceptably. For some bizarre reason when imported into Sourceforge the conversion decided to ressurrect a ton of folders & files that had been deleted long ago in CVS, which hadn’t happened when I tested this whole process locally, but all the other files did seem to be correct so some swift purging of the cheeky Lazaruses resolved it.

Subversion’s advantages over CVS are well known - the local diffs, the unified changesets & global revision numbers, nice tools like svnmerge. It’s not perfect though - it’s slower than CVS when checking out and committing, occasionally has a habit of corrupting the local working copy (hence the need for svn clean), and its approach to tags is plain dumb. The Subversion developers clearly liked the fact that they can abstract almost every action into a ‘copy’, but in practice this isn’t nearly as clever as they think it is, and doesn’t actually work well at all for tags. For a start, you should never be able to commit to a tag, and yet you can in Subversion - configuration managers the world over tend to find this to be a breaking of holy scripture. Secondly, because the source of the tag has no knowledge of it (because it’s just the source of a copy), you can’t look through the log of the branch from which the tag was taken and see an easy-to-read tag point there, as you can in CVS.  That actually applies to branches too in fact,  the origin point of a branch is only stored in the branch itself, not in the trunk (or whatever other branch it was spawned from), so from the source of the branch you can’t easily see where it splits off. Yes, you can solve it by relying on the fact that the start of the branch/tag will have a repository-wide revision number, and therefore you can look at the SVN revision log for the trunk / originating branch and infer that because the start of the branch comes between commits X and Y on the trunk, that’s where the branch/tag was taken. Sucks though, it’s much nicer to be explicit about these things, rather than using the ‘copy’ blunt instrument for everything - this is precisely why the ‘Revision Graph’ is so damn slow on TortoiseSVN even for a single file, when it’s very quick on TortoiseCVS. My hope is that they’ve addressed this in Subversion 1.5 along with branch merge tracking.

On balance though, Subversion is generally the most convenient for everyday development tasks so it’s still worth using despite the annoying niggles. Perforce is an even better choice if you can afford it but most won’t choose to (it’s free for open source use, but my guess is most people feel safer from any future changes in that policy on Subversion), and alternatives like Visual SourceSafe and AlienBrain are wholly inadequate in modern times, both because of their chronic inability to work well remotely, and their primitive support for concurrent development practices.

Of course, in the eyes of the open source world, any centralised repository system is strictly a bit ‘old skool’ now, even the popular Subversion. Distributed version control systems (DVCS) are all the rage now, and I can see why, since they most directly represent the open source development methodology. The idea is that there is no central repository (although in practice, there are bound to be those which are considered more authorative), and every developer is capable of synchronising / merging with every other developer in a peer-to-peer, ad-hoc fashion. Tools like Darcs, Git and Mercurial implement this methodology, and in theory, I love the idea. Certainly I can think of several cases where it would have been useful to allow third parties to synchronise their working copies with each other more smoothly & traceably - say, a small ad-hoc subgroup trying out a new technique which may take some time before it’s ready for the central version but people need to collaborate on it, sometimes without the direct involvement of the central team. Also, improved context-aware branching and merging as supported by these tools would definitely be useful to me as in a number of cases, quite large patches can get out of date while lenghty testing goes on, increasing the chances of conflicts when finally applied.

So, in theory I love DVCS - it probably wouldn’t be very natural for traditional corporate teams but for open source, and for some of the distributed commercial teams I’ve worked on, it could be very useful. The problem I see with them right now is that they’re lacking in good toolsets; most are command-line only and some were clearly designed with primarily Linux in mind (e.g. Git, which requires Cygwin to run on Windows). They also tend to be more complex, by nature of the many-to-many synchronisation approaches and high level of flexibility. I think given some more time, they will eventually supercede Subversion in the open source space, and perhaps even some corporate use, but right now I see them very much as I saw Subversion 5 years or more ago - some great ideas but most people will prefer to stick with the mature & well understood solution for now. Once we have a stable and mature TortoiseGit and Eclipse / VS integration, mainstream adoption will follow I’m sure. I would venture that most developers would be loathe to give up their slick Explorer / IDE integration for the kind of flexibility DVCS offers, because in most cases they need the integration more than they need the uber-branching; as useful as that is, it’s probably something that comes up only rarely except in the most popular open source projects.