Over the long weekend, a number of people seem to have picked up and commented on
Mikeal Rogers's essay about Apache and its adoption of the source code control tool, Git. For example,
Chris Aniszczyk pointed to the essay, and followed it up with some statistics and elaboration. Aniszczyk, in turn, points to
a third essay (a year old), by Josh Berkus, describing the PostgresQL community's migration to git, and
a fourth web page describing the Eclipse community's migration to git. (
Note: Both Eclipse and PostgresQL migrated from
CVS to git).
I find the essays by Rogers and Aniszczyk quite puzzling, full of much heat and emotion, and I'm not sure what to take from them.
Rogers seems to start out on a solid footing:
For a moment, let's put the git part of GitHub on the back burner and talk about the hub.
On GitHub the language is not code, as it is often characterized, it is contribution. GitHub presents a person to person communication system for contributions. Documentation, issues, and of course code, travel between personal repositories.
The communication medium is the contribution itself. Its value, its merit, its intention, all laid naked for the world to see. There is no hierarchy or politic embedded in the system. The creator of a project has a clear first mover advantage but the possibility is always there for its position to be supplanted by a fork, creating a social imperative to manage contributions in a satisfactory manor [sic] to her community.
This is all well-written and clear, I think. But I don't understand how this is a critique of Apache. In my seven years of experience with the Derby project at Apache, this is exactly how an Apache software project works:
- Issues are raised in the Apache issue-tracking system;
- discussion is held in the issue comments and on mailing lists;
- various contributors suggest ideas;
- someone "with an itch to scratch" dives into the problem and constructs a patch;
- the patch is proposed by attaching it to the issue-tracking system;
- further discussion and testing occurs, now shaped by the concrete nature of the proposed patch;
- a committer who becomes persuaded of the desirability of the patch commits it to the repository;
- eventually a release occurs and the change becomes widely distributed.
This is the process as I have seen it and participated in it, since back in 2004, and, I believe, was how it was done for years before that.
So what, precisely, is it that Apache is failing at?
Here is where Rogers's essay seems to head into the wilderness, starting with this pronouncement:
Many of the social principles I described above are higher order manifestations of the design principles of git itself.
[ ... ]
The problem here is less about git and more about the chasm between Apache and the new culture of open source. There is a growing community of young new open source developers that Apache continues to distance itself from and as the ASF plants itself firmly in this position the growing community drifts farther away.
I don't understand this at all. What, precisely, is it that Apache is doing to distance itself from these developers, and what does this have to do with git?
Rogers offers as evidence this email thread (use the "next message by thread" links to read the thread), but from what I can tell, it seems like a very friendly, open, and productive discussion about the mechanics of using git to manage projects at Apache, with several commenters welcoming newcomers into the community and encouraging them to get involved.
This seems like the Apache way working successfully, from what I can tell.
Aniszczyk's follow-on essay, unfortunately, doesn't shed much additional light. He states that "what has been happening recently regarding the move to a distributed version control system is either pure politicking [sic] or negligence in my opinion."
So, again, what is it that he is specifically concerned about? Here, again, the essay appears to head into the wilderness. "Let's try to have some fun with statistics," says Aniszczyk, and he presents a series of charts and graphs showing that:
- git is very popular
- lots of job sites, such as LinkedIn, are advertising for developers who know git
- There is no 3.
At this point, Aniszczyk says "I knew it was time to stop digging for statistics."
But again, I am confused about what he finds upsetting. The core message of his essay appears to be:
The first is simple and deals with my day job of facilitating open source efforts at Twitter. If you’re going to open source a new project, the fact that you simply have to use SVN at Apache is a huge detterent [sic] from even going that route.
[ ... ]
All I’m saying is that it took a lot of work to start the transition and the eclipse community hasn’t even fully completed it yet. Just ask the PostgreSQL community how quick it was moving to Git. The key point here is that you have to start the transition soon as it’s going to take awhile for you to implement the move (especially since Apache hosts a lot of projects).
Once again, I'm lost. Why, exactly, is it a huge deterrent to use svn? And why, exactly, does Apache need to convert its existing projects from svn to git? Just because LinkedIn is advertising more jobs that use git as a keyword? That doesn't seem like a valid reason, to me.
Note that, as I mentioned at the start of this article, the PostgresQL team migrated from CVS to git, not from Subversion to git. I can completely understand this. The last time I used CVS was in 2001, 10 full years ago; even at that time, CVS had some severe technical shortcomings and there was sufficient benefit to switching that it was worth the effort. So I'm not at all surprised by the PostgresQL community's decision. The article by Berkus, by the way, is definitely worth reading, full of wisdom about platform coverage, tool and infrastructure support, workflow design, etc.
So, to summarize (as I understand it):
- PostgresQL and Eclipse are migrating from CVS to git, successfully (although it is taking a significant amount of time and resources)
- Apache is working to integrate git into its policies and infrastructure, but still uses Subversion as its primary scm system
- Some people seem to feel like Apache is making the wrong decision about this
But what I don't understand, at the end of it all, is in what way this is opposed to "the Apache way?" From everything I can see, the Apache way is alive and well in these discussions.
UPDATE:Thomas Koch, in the comments, provides a number of substantial, concrete examples in which git's powerful functionality can be very helpful. The most important one that Thomas provides, I think, is this:
It is much easier to make a proper integration between review systems, Jenkins and Jira, if the patch remains in the VCS as a branch instead of leaving it.
I completely agree. Working with patch files in isolation is substantially worse than making reference to a branched change that is under SCM control. Certainly in my work with Derby I have seen many a contributor make minor technical errors while manipulating a patch file, that on the whole just adds friction to the overall process. Good point, Thomas!