Tuesday, March 26, 2013

Some versioning theory

As the old saying goes:

In theory, theory and practice should be the same. In practice, however ...

As is no surprise, when I get interested in something, I like to understand both the theory and the practice. By temperament, as it turns out, I'm much more of an engineer than a theoretician, so it typically turns out that I spend much more of my time understanding the practice than the theory, but I do make my efforts to comprehend the theory behind what I'm doing, whether that be concurrent programming, database storage systems, compilers, networking protocols, distributed object systems, etc. (those being some of my major interests over three decades of computing).

Anyway, my current interests are in the area of version management, so I find myself once again looking for information on both the practice and the theory of versioning.

So I've been spending a lot of time recently catching up with a fascinating project being conducted by Microsoft Research: Concurrent Revisions.

The Revisions project introduces a novel programming model for concurrent, parallel, and distributed applications.

At first, it might not be obvious why this is relevant to version management software, but the research team explain:

To simplify this problem, we introduce a novel programming construct called “revisions”, which is inspired by revision control systems commonly used by development teams. Revisions are forked and joined much like asynchronous tasks. However, rather than accessing global shared data directly (and thereby risking data races or atomicity violations), all revisions execute on a (conceptual) copy of the shared state, a "global mutable snapshot" so to speak. Any changes performed in a revision apply to that snapshot only, until the revision is joined at which the changes become globally effective.

This is, actually, a fairly radical idea, as they explain in their foundational 2010 paper presented at the OOPSLA conference: Concurrent Programming with Revisions and Isolation Types

Note that our use of revision diagrams to reason about program executions is a marked departure from traditional concurrency models such as sequentially consistent memory or serializable transactions, which reason about concurrent executions by considering a set of corresponding totally ordered sequential histories. These traditional models make the fundamental assumption that programmers must think sequentially, and that all concurrency must thus be ‘linearized’ by some arbitration mechanism. However, such arbitration invariably introduces nondeterminism, which may easily present a much larger problem for programmers than direct reasoning about concurrent executions.

I love looking at the revision diagrams in their paper, unsurprisingly, I guess, since I think that tools like the Perforce Revision Graph Tool are extremely powerful ways to comprehend complex information in a visual form.

And since I, myself, have found that my career has moved from starting with a deep fascination with and understanding of sequentially consistent memory models and serializable transaction models, on to my current fascination with version management, asynchronous replication, and conflict resolution, it is probably no surprise that I find the Microsoft Revisions project so worthwhile.

However, since the project was started, the team have gone on to flesh out their ideas in considerable detail, and have explored the use of the techniques in other problem domains beyond the original game-programming arena.

For example, last summer they presented work at ECOOP: Cloud Types for Eventual Consistency.

Our system uses revision diagrams to guarantee eventual consistency, as proposed in [ S. Burckhardt, M. Fahndrich, D. Leijen, and M. Sagiv. Eventually Consistent Transactions.]. Conceptually, the cloud stores the main revision, while devices maintain local revisions that are periodically synchronized. Revision diagrams are reminiscent of source control systems and provide an excellent intuition for reasoning about multiple versions and eventual consistency.

Providing adequate programming models for eventually-consistent storage systems is a very important topic nowadays, and the Microsoft Revisions team is doing some fascinating research. Their most recent work in this area is brand new this week: Understanding Eventual Consistency

At the moment there is a lot of confusion about the semantics of eventual consistency, as different systems implement it with different sets of features and in subtly different forms, stated either informally or using disparate and low-level formalisms.

Thank you Microsoft, for sponsoring this research, and for enabling the team to freely share their work with the world. It is much appreciated by people like me!

I'm looking forward to digging through their results in detail, and to continuing to follow their research as it progresses.

1 comment:

  1. Is "concurrent revisions" different from distributed VCS?