Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability. We provide fully serializable ACID semantics within fi ne-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters.
The paper is excellent: clearly-written, thorough, and relevant. It covers the topic from the high-level requirements, through architecture and design, down to the important aspects of the implementation. You should read the entire paper (I'm just starting on my second pass through it). Of course, if you're not already comfortable with BigTable, Chubby, Paxos, etc., you're going to spend a lot of time chasing references, and probably need to come back to this paper later. But hopefully you've been Keeping Up With The Times, and so this caution isn't necessary...
Anyway, although I don't have a lot of insight regarding the basic content of the paper (except to say: "Thanks, Google, for once again sharing the details of your fascinating work!"), I wanted to share one part that I found particularly interesting, from a section near the end of the paper:
Development of the system was aided by a strong emphasis on testability. The code is instrumented with numerous (but cheap) assertions and logging, and has thorough unit test coverage. But the most effective bug-finding tool was our network simulator: the pseudo-random test framework. It is capable of exploring the space of all possible orderings and delays of communications between simulated nodes or threads, and deterministically reproducing the same behavior given the same seed. Bugs were exposed by finding a problematic sequence of events triggering an assertion failure (or incorrect result), often with enough log and trace information to diagnose the problem, which was then added to the suite of unit tests. While an exhaustive search of the scheduling state space is impossible, the pseudo-random simulation explores more than is practical by other means. Through running thousands of simulated hours of operation each night, the tests have found many surprising problems.
What I particular enjoy about this passage is the way it delivers that hard-won, hard-earned, worth-reflecting-on knowledge that building reliable systems of significant complexity requires not just a single approach, but a collection of techniques. Similar to the way that security experts will often argue for "defense in depth", observe the overall plan of attack used by the Google team:
- unit tests
- pseudo-random test generators
- event-sequence tracking
- for each found problem, adding the case back to the suite of unit tests
It wasn't enough to have one technique, or a simple approach; all of these tools must be taken out of the toolbox and used, routinely, throughout the lifetime of the software.
This is how real systems are built; this is how enduring software gets made. As they say, it's all in the details, and certainly this part of the paper says nothing ground-breaking or startling.
But it was still my favorite part of the paper!