Journal of a Programmer: How Complex Systems Fail

Thursday, January 3, 2013

How Complex Systems Fail

I quite enjoyed Richard Cook's presentation at the Velocity 2012 Conference: How Complex Systems Fail. The video is about 30 minutes long, but it moves right along and he is an excellent speaker.

Among the key concepts in the talk is this observation:

As systems developers, we design for reliability:

stiff boundaries, layers, formalisms

defence in depth

redundancy

interference protection

assurance

accountability

But what we actually want is resilience, which is different:

withstand transients

recover swiftly and smoothly from failures

prioritize to serve high level goals

recognize and respond to abnormal situations

adapt to change

If you consider yourself a serious systems software engineer, or if you want to become one, you should listen to Cook's talk and go read some of his papers at his web site. He is a clear speaker and writer, and his proposals are sensible and grounded in real experience. Start by reading this concise summary: How Complex Systems Fail, and then move on from there to explore Cook's ideas about how to make systems safer by making them more resilient, for example in Operating at the Sharp End: The Complexity of Human Error.

Thankfully, the systems that I work on don't come close to approaching the safety-critical systems that Cook considers, but I'm quite grateful to him for sharing his experiences and observations, because even ordinary systems software can be made more stable, tolerant of errors, and adaptable, by considering these issues.

Update: Fixed the link to the CTLab site (thanks Anton!)

Update 2: Fixed the video link (is Blogger eating my links? Or am I just getting old...nevermind, don't answer that.)

2 comments:

Anton TagunovJanuary 4, 2013 at 4:53 AM
Hi, the link to the guy's website seems to be broken in the post. It seems it should have been http://www.ctlab.org/Cook.cfm
ReplyDelete
Replies
TimJanuary 4, 2013 at 4:03 PM
Hey Bryan, the video link to the Velocity presentation seems busted too...
ReplyDelete
Replies

Journal of a Programmer

Thursday, January 3, 2013

How Complex Systems Fail

2 comments:

About Me

Blog Archive

Pages