Recently, I started at Curt Monash's DBMS2 blog, where I found an introductory essay about Cassandra.
That led me to Todd Hoff's essay on Cassandra versus sharded MySQL.
That led me to Evan Weaver's overview article about Cassandra and how it fits in at Twitter, complete with this great hand-drawn image: .
That led me to the Cassandra wiki at the Apache Software Foundation.
That led me to the video lecture by Eric http://www.blogger.com/img/blank.gifEvans of Rackspace, and to the paper by Lakshman and Malik about Cassandra at Facebook, as well as to Lakshman's earlier work as part of the Dynamo project at Amazon.
I'm still wrapping my head around these eventually-consistent, non-relational data stores; after all, I'm a relational DBMS guy from a long time back. The Cassandra papers are quite approachable, and give a lot of fascinating insight into the behavior of these systems:
we will focus on the core distributed systems techniques used in Cassandra: partitioning, replication, membership, failure handling, and scaling. All these modules work in synchrony to handle read/write requests. Typically a read/write request for a key gets routed to any node in the Cassandra cluster. The node then determines the replicas for this particular key. For writes, the system routes the requests to the replicas and waits for a quorum of replicas to acknowledge the completion of the writes. For reads, based on the consistency guarantees required by the client, the system either routes the requests to the closest replica or routes the requests to all replicas and waits for a quorum of responses.
On we go; the learning never stops!