Nearly every DBMS implementation has a transaction log, where transactions write information about the changes that they are making to the database.
Transaction logs are about as foundational a technology as exists in Database Systems, and Database Systems are about as old a technology as exists in Computer Science (there are seminal notes on transaction logs dating back, I believe, to the 1950s, and papers which are still studied today which were published in the mid-1970's), so it's a little bit surprising, I think, to see that significant research is still occurring in the field of transaction logging.
Here are a few very interesting examples to back up my claim: 2 books (!) and a handful of fairly recent papers.
- I Heart Logs: Event Data, Stream Processing, and Data Integration
Kreps was one of the co-creators of Kafka, a distributed system where the log is truly the "heart" of the engine. See also The Log: What every software engineer should know about real-time data's unifying abstraction.
- Instant Recovery with Write-Ahead Logging: Page Repair, System Restart, and Media Restore (Synthesis Lectures on Data Management)
Most existing DBMS implementations refuse to allow access to your data during the (typically brief) period when they are recovering from a system crash. This work shows how a carefully-implemented system can get around that limitation. (One of the authors is a colleague of mine.)
- Scalability of write-ahead logging on multicore and multisocket hardware
This paper, often known as the Aether paper, introduces several critical new ideas, including the use of Elimination Trees for managing log buffer memory contention, and the challenges that arise when trying to track transaction dependencies in the presence of parallel logging to multiple log stores.
- Calvin: Fast Distributed Transactions for Partitioned Database Systems
The Calvin paper inspired people to start thinking about whether we can avoid Two Phase Commit in distributed transaction systems. For decades, people avoided this topic, but it's an open topic again!
- Taurus: Lightweight Parallel Logging for In-Memory Database Management Systems
The Taurus paper goes into considerable detail about possible ways to implement parallel logging systems. The ideas are broadly similar to the Aether paper but with some different approaches. Other similar recent work includes Plover: parallel logging for replication systems., Eleda: Scalable Database Logging for Multicores, and SiloR: Fast Databases with Fast Durability and Recovery Through Multicore Parallelism
It's good to see that people are still working away at trying to figure out how to improve these age-old techniques, squeezing just a little bit more out of their computers, making their databases work just that much faster.
Onwards and upwards!
No comments:
Post a Comment