I've been spending my time recently reading some interesting academic research papers regarding the different workflows and behaviors that arise in DVCS systems vs CVCS systems, and thought I'd share some links.
I'm not tremendously impressed with the level of sophistication of academic research into VCS functionality, but it does seem to be slowly improving and these recent papers have some interesting observations.
The best of the bunch, I think, are the papers from Christian Bird of Microsoft, which is perhaps no surprise because the industrial side of Microsoft has been doing some of the best commercial work in VCS systems recently, and Microsoft certainly has experience dealing with the issues that matter to software developers.
- Work Practices and Challenges in Pull-Based Development: The Integrator’s Perspective
In the pull-based development model, the integrator has the crucial role of managing and integrating contributions. This work focuses on the role of the integrator and investigates working habits and challenges alike. We set up an exploratory qualitative study involving a large-scale survey involving 749 integrators, to which we add quantitative data from the integrator’s project. Our results provide insights into the factors they consider in their decision making process to accept or reject a contribution.
- Will My Patch Make It? And How Fast? : Case Study on the Linux Kernel
The Linux kernel follows an extremely distributed reviewing and integration process supported by 130 developer mailing lists and a hierarchy of dozens of Git repositories for version control. Since not every patch can make it and of those that do, some patches require a lot more reviewing and integration effort than others, developers, reviewers and integrators need support for estimating which patches are worthwhile to spend effort on and which ones do not stand a chance.
- Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository
Based on a series of in-depth interviews with central and peripheral GitHub users, we examined the value of transparency for large-scale distributed collaborations and communities of practice. We find that people make a surprisingly rich set of social inferences from the networked activity information in GitHub, such as inferring someone else’s technical goals and vision when they edit code, or guessing which of several similar projects has the best chance of thriving in the long term.
- How Do Centralized and Distributed Version Control Systems Impact Software Changes?
In this paper we present the first in-depth, large scale empirical study that looks at the influence of DVCS on the practice of splitting, grouping, and committing changes. We recruited 820 participants for a survey that sheds light into the practice of using DVCS.
- Cohesive and Isolated Development with Branches
The adoption of distributed version control (DVC), such as Git and Mercurial, in open-source software (OSS) projects has been explosive. Why is this and how are projects using DVC? This new generation of version control supports two important new features: distributed repositories and histories that preserve branches and merges. Through interviews with lead developers in OSS projects and a quantitative analysis of mined data from the histories of sixty project, we find that the vast majority of the projects now using DVC continue to use a centralized model of code sharing, while using branching much more extensively than before their transition to DVC.
- Expectations, Outcomes, and Challenges Of Modern Code Review
We empirically explore the motivations, challenges, and outcomes of tool-based code reviews. We observed, interviewed, and surveyed developers and managers and manually classified hundreds of review comments across diverse teams at Microsoft. Our study reveals that while finding defects remains the main motivation for review, reviews are less about defects than expected and instead provide additional benefits such as knowledge transfer, increased team awareness, and creation of alternative solutions to problems.
- Collaboration in Software Engineering: A Roadmap
Software engineering projects are inherently cooperative, requiring many software engineers to coordinate their efforts to produce a large software system. Integral to this effort is developing shared understanding surrounding multiple artifacts, each artifact embodying its own model, over the entire development process.
- Is It Dangerous to Use Version Control Histories to Study Source Code Evolution?
This allows us to answer: How much code evolution data is not stored in VCS? How much do developers intersperse refactorings and edits in the same commit? How frequently do developers fix failing tests by changing the test itself? How many changes are committed to VCS without being tested? What is the temporal and spacial locality of changes?
- The Secret Life of Patches: A Firefox Case Study
In this paper, we study the patch lifecycle of the Mozilla Firefox project. The model of a patch lifecycle was extracted from both the qualitative evidence of the individual processes (interviews and discussions with developers), and the quantitative assessment of the Mozilla process and practice. We contrast the lifecycle of a patch in pre- and post-rapid release development.
- Towards a taxonomy of software change
Previous taxonomies of software change have focused on the purpose of the change (i.e. the why) rather than the underlying mechanisms. This paper proposes a taxonomy of software change based on characterizing the mechanisms of change and the factors that influence these mechanisms.