Journal of a Programmer: The Architecture of Open Source Applications

Over the summer, I read The Architecture of Open Source Applications, a rather unusual book.

The term "architecture", when it comes to software engineering, is a somewhat soft and fuzzy concept; the editors of AOSA define it as follows:

Each chapter describes the architecture of an open source application: how it is structured, how its parts interact, why it's built that way, and what lessons have been learned that can be applied to other big design problems.

Sometimes I get very frustrated when the term "architecture" is used, because it often feels like "title inflation": software engineers who want a bit of an ego boost describe themselves as "architects", a problem made vivid by Joel Spolsky's wonderful essay: Don't Let Architecture Astronauts Scare You in one of the greatest eviscerations ever committed to the World Wide Web:

When great thinkers think about problems, they start to see patterns. They look at the problem of people sending each other word-processor files, and then they look at the problem of people sending each other spreadsheets, and they realize that there's a general pattern: sending files. That's one level of abstraction already. Then they go up one more level: people send files, but web browsers also "send" requests for web pages. And when you think about it, calling a method on an object is like sending a message to an object! It's the same thing again! Those are all sending operations, so our clever thinker invents a new, higher, broader abstraction called messaging, but now it's getting really vague and nobody really knows what they're talking about any more.
When you go too far up, abstraction-wise, you run out of oxygen. Sometimes smart thinkers just don't know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don't actually mean anything at all.

So it was with rather a fair amount of trepidation that I wandered over to Lulu late last spring, and plunked down some money for my own personal copy of The Architecture of Open Source Applications: was I going to find insight? Or Architecture Astronauts?

I am pleased to say that this book is, for the most part, happily free of vague descriptions and hand-waving, and enjoyably packed with concrete thought and hard-earned wisdom.

AOSA is a compilation of 25 essays, by 25 different authors. Each author writes about a particular Open Source application, one which they know intimately and thoroughly. The authors, for the most part, are the original creators or the primary current maintainers of the applications in question. Better, the applications are chosen wisely and represent some of the best-written, most well-proven, most widely-used software on the planet. Let's look at the applications they picked:

Asterisk
Audacity
The Bourne-Again Shell
Berkeley DB
CMake
Continuous Integration
Eclipse
Graphite
The Hadoop Distributed File System
Jitsi
LLVM
Mercurial
The NoSQL Ecosystem
Python Packaging
Riak and Erlang/OTP
Selenium WebDriver
Sendmail
SnowFlock
SocialCalc
Telepathy
Thousand Parsec
Violet
VisTrails
VTK
Battle for Wesnoth

You could quibble with some of these picks, maybe, but you'd be beaten down by your friends: this is a serious list of substantial and important applications, and if you can't find something here that both (a) interests you and (b) has something to teach you about how software is structured, designed, and written, then the software field is not for you.

In a book this large and varied, it's hard to pick out individual passages, since different aspects will appeal to different people. But in an attempt to give you a feel for the book, here are a handful of observations that should allow you to understand what sort of book this is:

Talking about the development of sendmail, Eric Allman shares a laundry list of wisdom developed over the years, including principles such as:
- Make Sendmail Adapt to the World, Not the Other Way Around
- Change as Little as Possible
- Think About Reliability Early
, and describes how they evolved an approach that, decades later, came to be known as one of the tents of Extreme Programming, Do the simplest thing that could possibly work:
There were many things that were not done in the early versions. I did not try to re-architect the mail system or build a completely general solution: functionality could be added as the need arose. Very early versions were not even intended to be completely configurable without access to the source code and a compiler (although this changed fairly early on). In general, the modus operandi for sendmail was to get something working quickly and then enhance working code as needed and as the problem was better understood.

Note how Allman's approach echoes many of the principles of the Agile Manifesto.
Another chapter about gaming, describing the space-based strategy game Thousand Parsec, talks about the value of incremental development:
A major key to the development of Thousand Parsec was the decision to define and build a subset of the framework, followed by the implementation. This iterative and incremental design process allowed the framework to grow organically, with new features added seamlessly. This led directly to the decision to version the Thousand Parsec protocol, which is credited with a number of major successes of the framework.

A similar approach is described by Chris Davis in the chapter on Graphite:
By and large Graphite evolved gradually, hurdle by hurdle, as problems arose. Many times the hurdles were foreseeable and various pre-emptive solutions seemed natural. However it can be useful to avoid solving problems you do not actually have yet, even if it seems likely that you soon will. The reason is that you can learn much more from closely studying actual failures than from theorizing about superior strategies.
"Avoid solving problems you do not actually have yet" -- I wonder how many thousands of failed software projects would have succeeded if their teams had just been able to comprehend and follow this simple rule of thumb?
Different authors reveal differing approaches to similar problems:
- Describing the construction of the fantasy strategy game Battle for Wesnoth, the authors talk about the temptation to use object-oriented inheritance techniques to model the various types of units that can appear in the game, and why those approaches don't work:
  It is tempting to make a base unit class in C++, with different types of units derived from it. For instance, a wose_unit class could derive from unit, and unit could have a virtual function, bool is_invisible() const, which returns false, which the wose_unit overrides, returning true if the unit happens to be in a forest.
  ...
  Wesnoth's unit system doesn't use inheritance at all to accomplish this task.
  Why did they make this choice? Well, you'll need to read their essay :)
- Another fun thing about the Thousand Parsec chapter is to see the contrast with the Battle of Wesnoth chapter:
  In a Thousand Parsec universe, every physical thing is an object. In fact, the universe itself is also an object. This design allows for a virtually unlimited set of elements in a game, while remaining simple for rulesets which require only a few types of objects.
That's the wonderful thing about software: two different groups can look at things and take very different approaches, and both approaches are worth understanding and learning from.
Given the work I do in my day job, I found Chet Ramey's observation about his work on bash particularly worth noting:
I have spent over twenty years working on bash, and I'd like to think I have discovered a few things. The most important -- one that I can't stress enough -- is that it's vital to have detailed change logs. It's good when you can go back to your change logs and remind yourself about why a particular change was made. It's even better when you can tie that change to a particular bug report, complete with a reproducible test case, or a suggestion.

I'll close with an observation about open source in the context of this book: the book is much more about architecture than it is about open source. That is, I didn't find a lot of discussion about topics such as: building your community, establishing roles and relationships in an open source organization, learning to deal with uninvited feedback, or finding value from unexpected contributions, all of which are part of the open source process, but don't have a lot to do with the architecture of software.

So in that respect, the book sticks to its knitting, and concentrates on what it sets out to do. But note: this wouldn't be possible unless these were open source applications! That is, by definition we can't have these sorts of public discussion about the architecture of closed source applications, because there is no open discussion of architecture without open design, and open source. Consider books such as VAX/VMS Internals and Data Structures, or the more modern Understanding the Linux Kernel; the only reason you see books like these is that the systems they describe allow access to their source code, and so we can all profit from studying how these systems are built.

While I find the open source development process intriguing, it is nice to see a book such as The Architecture of Open Source Applications, because studying software itself is incredibly important, and it is vital that we study real systems, not just toy applications built as exercises in programming courses, in order to learn the lessons and techniques that come from the challenges of building real systems that have to solve real world problems.

This is not the best book ever written: the varied style of the authors results in a somewhat choppy experience, and some authors are better than others at sharing what they know and what they've learned. But it is a fascinating set of essays, and if you are (or want to be) a practicing software engineer, you will find much to learn by digging into and reading this material closely.

Journal of a Programmer

Monday, October 3, 2011

The Architecture of Open Source Applications

No comments:

Post a Comment

About Me

Blog Archive

Pages