I first came across these ideas while browsing Wes Felter's blog, sometime last winter, but this is apparently quite an old idea which I've only learned about recently. Most of what I've read cites Prokop's 1999 Master's Thesis at MIT as the work which really got this area going, but that work in turn builds on a data structure called the van Emde Boas layout which dates from 1977, which is quite old (in Computer Science years that is). I'm particularly interested in cache-oblivious BTree algorithms, but I find all of the cache-oblivious research quite interesting; the B-Tree, of course, is an older-still data structure, from Rudolf Bayer's work at Boeing in 1970.
The basic notion of cache oblivious data structures is fairly simple: modern computers have multi-level storage hierarchies, in which data which is being manipulated by the CPU moves back and forth between various levels of storage:
- in a register on the CPU
- in specialized cache memory (often called Level-1 or Level-2 cache)
- in the computer's main memory
- on disk
An algorithm is referred to as cache-aware if the algorithm itself contains explicit parameters that can be tuned to optimize the performance of the algorithm for a specific cache size. Most algorithms are cache-aware, meaning that, in practice, they either must be tuned specifically on a machine-by-machine basis, or will perform better on some machines than on others, because their default behavior works well for some cache sizes but not others.
This is undesirable; we'd prefer to write algorithms that work equally well on different cache sizes, without requiring explicit configuration and tuning; such algorithms are called cache-oblivious, which doesn't mean that they don't know of the existence of the cache, but instead means that they automatically work well in various different cache sizes without needing to be configured or tuned.
Caches, in general, take advantage of two principles:
- The principle of temporal locality,
- And the principle of spatial locality.
The principle of spatial locality is the observation that a program which uses a particular piece of data is likely to use some nearby data soon. A program which is stepping through an array is a good example of the principle of spatial locality, because an access to array element 1 is often followed by an access to array element 2, and then an access to array element 3, and so forth. So a good caching implementation will try to keep nearby data available in the cache.
The standard Bayer-McCreight BTree exhibits both temporal locality and spatial locality. However, the standard implementation block size B, which is generally optimized with respect to disk transfer sizes and tree heights, and does not generally work very well for the memory cache hierarchy. This is where cache-oblivious BTree implementations and the van Emde Boas tree layout come into play. The van Emde Boas tree is a data structure which enables the Btree to lay out the index nodes in such a way that they have memory behavior similar to a single large array, which means that the in-memory behaviors of the cache-oblivious BTree have excellent memory cache performance while still preserving the BTree's excellent disk access performance.
Apparently there is a new database startup company named Tokutek, which is trying to use these new types of BTree indexes in a complete database engine. The founders of Tokutek include a number of the researchers who developed cache-oblivious search tree algorithms, and they have some interesting postings on their blog discussing their ideas in more depth.
It also seems that other groups are working on similar techniques.
I'm still trying to wrap my head around the way that the van Emde Boas tree improves the in-memory spatial locality of the Btree search nodes, but it's been quite interesting to read the work so far, and I'm looking forward to learning more about these algorithms and techniques.