I was reading the intriguing paper by Krohn, Kohler, and Kaashoek: Events Can Make Sense, where they say that they are working on:
A high-level, type-safe API for event-based programming that frees it from the stack-ripping problem but is still backwards compatible with legacy event code.
The Tame team explain the stack-ripping problem as follows:
But a key advantage of events -- a single stack -- is also a liability. Sharing one stack for multiple tasks requires stack ripping, which plagues the development, maintenance, debugging and profiling of event code. The programmer must manually split (or "rip") each function that might block (due to network communication or disk I/O), as well as all of its ancestors in the call stack.
The terminology was apparently coined by the members of the Microsoft Research Farsite team back around the turn of the century (I love using phrases like that :) ); they describe the problem in an excellent paper with a rather long-winded title: Cooperative Task Management without Manual Stack Management or, Event-driven Programming is Not the Opposite of Threaded Programming.
Stack ripping is introduced in section 3.1 of the Microsoft Research paper, under the sub-title Automatic versus manual:
Programmers can express a task employing either automatic stack management or manual stack management. With automatic stack management, the programmer expresses each complete task as a single procedure in the source language. Such a procedure may call functions that block on I/O operations such as disk or remote requests. While the task is waiting on a blocking operation, its current state is kept in data stored on the procedure's program stack.
In contrast, manual stack management requires a programmer to rip the code for any given task into event handlers that run to completion without blocking. Event handlers are procedures that can be invoked by an event-handling scheduler in response to events, such as the initiation of a task or the response from a previously-requested I/O.
The data structure that is used to track the in-progress task, including the state of the task, and a reference to the callback procedure that will be invoked when the blocked operation completes, is generally called a "continuation". The term "continuation" is nothing new; credit for this term is generally given to the Scheme programming language, for example see Daniel Friedman's work nearly 35 years ago, probably starting with this paper (which I haven't read).
Although I've never programmed in Scheme, I'm very familiar with the basic events-versus-threads tradeoffs, and with the complexities of trying to write resource-efficient task processing code in a multi-threaded server. And when the Microsoft Research team describe the core problem, I absolutely feel their pain:
Software evolution substantially magnifies the problem of function ripping: when a function evolves from being compute-only to potentially yielding, all functions, along every path from the function whose concurrency semantics have changed to the root of the call graph may potentially have to be ripped in two.
The observation that this problem has been around for pretty much as long as computer programming has existed means that we're likely to continue living with it for some time to come. It's great to see the programming language research community continue to discuss and debate it; it's a hard problem and worth working on. One day, perhaps, some elegant programming language of the future will make life easier for systems programming grunts like me. For now, I much enjoyed reading the Tame paper and the Farsite paper; each time I follow through the details of a clearly-written description such as these, it helps me understand and reason about the programming patterns in my own code, and gives me better terminology and concepts to use when I'm trying to describe my work to others.
Post a Comment