Tuesday, April 20, 2010

MapReduce for single-system programming

If you're familiar with MapReduce, it's probably in the context of giant data centers like Google's, where enormous tasks are processed by decomposing them into smaller jobs which are then distributed over thousands of individual servers, with the results re-assembled at the end.

Well, here's a nice body of work by a team at Stanford, showing how they are using the basic concepts of MapReduce (functional programming, problem decomposition, parallelism, resource management) to accomplish parallel programming tasks on single-system configurations.

Google's MapReduce implementation facilitates processing of terabytes on clusters with thousands of nodes. The Phoenix implementation is based on the same principles but targets shared-memory systems such as multi-core chips and symmetric multiprocessors.

Most parallel and concurrent programming APIs are too hard: to complex to understand, and to easy to use incorrectly. MapReduce has been successful over the last 15 years because it nicely balances the power of parallelism with a clear and simple programming abstraction.

No comments:

Post a Comment