Sunday, November 14, 2010

Nice Netflix paper on High-Availability Storage

Sid Anand, a Netflix engineer who writes an interesting blog, recently published a short, very readable paper entitled Netflix's Transition to High-Availability Storage Systems. If you've been wondering about cloud computing, and about who uses it, and why, and how they build effective systems, you'll find this paper quite helpful.

The paper packs a lot of real-world wisdom into a very short format. I particularly liked this summary of what Anand learned about building highly-available systems while at eBay:

  • Tables were denormalized to a large extent

  • Data sets were stored redundantly, but in different index structures

  • DB transactions were forbidden with few exceptions

  • Data was sharded among physical instances

  • Joins, Group Bys, and Sorts were done in the application layer

  • Triggers and PL/SQL were essentially forbidden

  • Log-shipping-based data replication was done

It's an excellent list. At my day job, we spend a lot of time thinking about how to build highly-available, highly-reliable systems which service many thousands of users concurrently, and you'd recognize most of the above principles in the internals of our server, even though the details differ.

To avoid your data being your bottleneck, you have to build infrastructure which breaks that bottleneck:

  • replicate the data

  • carefully consider how you structure and access that data

  • shift work away from the most-constrained resources

It sounds very simple, but it's oh-so-hard to do it properly. That's what still makes systems like eBay, Netflix, Amazon, Google, Facebook, Twitter, etc. the exception rather than the rule.

It should be no surprise that so many different engineers arrive at broadly similar solutions; using proven basic principles and tried-and-true techniques is the essence of engineering. Although my company is much smaller than the Internet giants, we are concerned with the same problems and we, too, are sweating the details to ensure that we've built an architecture that scales. It's exciting work, and it's fun to see it bearing fruit as our customers roll out massive internal systems successfully.

I enjoyed Anand's short paper, and I've enjoyed reading his blog; I hope he continues to publish more work of this caliber.

Now, back to solving the 7-by-7 KenKen while the bacon cooks... :)

No comments:

Post a Comment