Sunday, September 23, 2012

Jim Gray's mantle

It's now been nearly 6 years since Jim Gray was lost at sea. Gray was the pre-eminent practitioning computer scientist in my lifetime. I think what made him special was his ability to blend the theoretical and the practical. Like any great theorist, he was thinking years ahead of the rest of us; but as a great practitioner, he was interested in building production systems that worked, now. In his nearly four decades at IBM, Tandem, DEC, and Microsoft, he built teams and organizations that built industrial strength software that supported round-the-clock production use. He literally "wrote the book" on system software: Transaction Processing: Concepts and Techniques is the best book ever written on how database software actually works.

I was thinking about Jim Gray recently.

Like probably every other serious systems software practitioner on the planet, I've been feverishly digging my way through the latest work from Jeff Dean and Sanjay Ghemawat: Spanner: Google's Globally-Distributed Database.

The Spanner paper is rich and fascinating; every page is full of intriguing information. There are probably half-a-dozen massive breakthroughs being reported on here, any one of which would have warranted a full paper of its own:

  • Snapshot isolation via globally-meaningful commit timestamps
  • TruTime's API for bounded clock uncertainty
  • Blending GPS and atomic clock time sources
  • Globally concurrent atomic schema updates
  • Two-phase commit over Paxos
  • SQL-like query language extensions for the Spanner data model
  • INTERLEAVE IN DDL for locality-aware sharding definitions
The list goes on and on; it's no exaggeration to suggest that I'll be reading and re-reading this paper all fall.

But what prompted this post, and the reason why I started by talking about Jim Gray, is less about the specifics of Spanner, and more about Dean and Ghemawat and, broadly, about the way that research and development in systems software is occurring nowadays.

Along those lines, let me point you at two interesting recent essays:

Both authors present their views about the various "types" of research communities, and about how those communities tend to approach their work.

And there are differences; however, my feeling is that, within both the operating system and database communities, there are those who focus on theory, and there are those who focus on practice, and then there are those who are able to blend the two approaches. As Professor Regehr notes:

  1. The best argument is a working system. The more code, and the more results, the better. Something that is clearly a toy isn’t convincing. It is not necessary to build an abstract model, conduct a user study, prove soundness, prove correctness, or show any kind of asymptotic bound. In fact, if you want to do these things it may be better to do them elsewhere.
  2. The style of exposition is simple and direct; this follows from the previous point. I have seen cases where a paper from the OS community and a paper from the programming languages community are describing almost exactly the same thing (probably a static analyzer) but the former paper is super clear whereas the latter is incredibly difficult to figure out. To some extent I’m just revealing my own biases, but I also believe the direct approach to exposition is objectively better; I’ll try to return to this subject in a later post.
  3. The key to a strong research result is finding the right abstraction. A good abstraction is beautiful; it imposes little performance penalty; it leads to reliable systems; it leaks the right information and blocks things you didn’t want to know. It just feels right. The abstraction is probably for something low-level, but this doesn’t need to be the case. Finding good abstractions may sound easy but it’s super hard, often requiring lots of code to be thrown away multiple times.
But I think this is true in all systems software areas. I acknowledge and concur with the (historical) distinctions noted by Professor Brewer: OS and DBMS: Philosophical Similarities & Differences, but I also agree with Brewer that a modern approach to work in systems software has to:
work from both of these directions, cull the lessons from each, and ask how to use these lessons today both within and OUTSIDE the context of these historically separate systems.

Which brings us back to Dean and Ghemawat, and to what they've done in the last fifteen years:

  • The Google File System
  • BigTable
  • Map/Reduce
  • Protocol Buffers
  • Continuous Profiling
  • The Swift Java Compiler
  • and now, Spanner
What are these? Well, they are
  • working systems
  • presented in a style that is simple and direct
  • concentrating on finding the right abstractions

Now, they've still got a ways to go to achieve what Jim Gray did, but I think you can make a reasonable argument that the most exciting, world-changing, intellectually-sophisticated yet pragmatically-realistic work in the systems software field is occurring in Google's research teams, and it's breath-taking to read through the "Future Work" section of the Spanner paper and consider what they are hoping to work on next.

Talk to you later; it's time to go re-read the Spanner paper again and chase some more references...

1 comment:

  1. Thanks I enjoyed that paper. Not sure about how I feel about all the world's data depending on some clocks though....