Tuesday, August 7, 2012

MVCC theory

If you're one of those people who is fascinated by the internals of concurrency control algorithms, here's some fun new references that you might enjoy.

  • I recently came across a great new paper from a team lead by Per-Ake Larson of Microsoft: High-Performance Concurrency Control Mechanisms for Main-Memory Databases

    These guys are really pushing the edge, trying for both the highest-concurrency AND the highest raw performance, because they're looking at fully memory-resident databases. The paper is quite clear and easy to read, and I really enjoyed the way that they dug into some of the intricacies and edge cases. Plus, it has a great set of references to chase.

    Decades of research has shown that multiversion concurrency control (MVCC) methods are more robust and perform well for a broad range of workloads. This led us to investigate how to construct MVCC mechanisms optimized for main memory settings. We designed two MVCC mechanisms: the first is optimistic and relies on validation, while the second one is pessimistic and relies on locking. The two schemes are mutually compatible in the sense that optimistic and pessimistic transactions can be mixed and access the same database concurrently. We systematically explored and evaluated these methods, providing an extensive experimental evaluation of the pros and cons of each approach.
  • Following some of the references from the above paper, I came across an older paper by Alexander Thomasian; it came out about a decade ago, but it was new to me: Concurrency Control: Methods, Performance, and Analysis.. The paper is behind the ACM paywall, but you might be able to find it online elsewhere.

    This paper spends a lot of time trying to derive analytical formulae for evaluating the performance impacts of concurrency control, which I think is rather a waste of time, in general, since the assumptions turn out to be rather ridiculous. I also hate the way the paper uses its own jargon, like saying "txn" instead of "transaction".

    But there are some nice sections, for example Section 2.6 which surveys a collection of "Methods to Improve Locking Performance".

    The concurrent processing of long read-only queries for decision support applications and short update txns introduces conflicts between the two workloads, especially when the former follows a strict 2PL paradigm.
  • A much more interesting paper is Michael Cahill's PhD thesis from 2009: Serializable Isolation for Snapshot Databases

    The point of his thesis is this: Snapshot Isolation, by itself, does not provide Serializable transactions. However, it is possible to augment Snapshot Isolation with additional locks, in a manner which provides serializability.

    Despite the attractive properties of SI, it has been known since SI was formalized in (Berenson et al., 1995) that SI permits non-serializable executions. In particular, it is possible for an SI-based concurrency control to interleave some transactions, where each transaction preserves an integrity constraint when run alone, but where the final state after the interleaved execution does not satisfy the constraint. This occurs when concurrent transactions modify different items that are related by a constraint, and it is called the “Write Skew” anomaly.


    In this thesis, we instead focus [on] changing the DBMS internals to guarantee that every execution is serializable regardless of the semantics of the transactions, while still having the attractive properties of SI, in particular much better performance than is allowed by strict two-phase locking.

    Note that, while working for SleepyCat/Oracle about 8 years ago, Cahill wrote a Snapshot Isolation implementation for BerkeleyDB, which I believe is the implementation that is now shipped as the "transactional" BDB edition, so he certainly knows what he's talking about here. Cahill's thesis ended up winning an award for the Australian dissertation of the year. Cahill also discusses an alternate implementation, for the InnoDB engine (again, an Oracle product) of MySQL.

Pretty geeky stuff, I know; my wife will certainly give me a hard time for this post. But if this is your cup of tea, drink deeply!


  1. Hi Bryan,

    you might be interested to know that there is an Apache lab that tries to implement a MVCC BTree :

    It's quite basic atm (the project is quite young, one month), but still, it's already accepting addition and removals of elements in memory, with some good performances.

    There is a lot to do obviously...

  2. BerkeleyDB transactions are fully serializable, by default. You can configure them for snapshot isolation, but it comes with a large number of caveats.