Journal of a Programmer: Learning from the Skype outage

Wednesday, December 29, 2010

Learning from the Skype outage

I'm not much of a Skype user recently, but in the past I used it quite a bit; it's a great service!

So I wasn't much impacted by last week's Skype system outages, but I was still interested, because Skype is a big complex system and I love big complex systems :)

If, like me, you're fascinated by how these systems are built and maintained, and what we can learn from the problems of others, you'll want to dig into some of what's been written about the Skype outage:

Start with this report from Lars Rabbe, Skype's CIO.

Also check Dan York's blog; he's been publishing some very interesting information about the outage.

Here's some background material from Skype about their basic architecture

And here's a nice, though somewhat old, paper from some researchers at Columbia with some great background information about Skype system architecture

Building immense complicated distributed systems is incredibly hard; I've been working in the field for 15 years and I'm painfully aware of how little I really know about this.

It's wonderful that Skype is being so forthcoming about the problem, what caused it, what was done to fix it, and how it could be avoided in the future. I am always greatful when others take the time to write up information like this -- post-mortems are great, so thanks Skype!

Journal of a Programmer

Wednesday, December 29, 2010

Learning from the Skype outage

No comments:

Post a Comment

About Me

Blog Archive

Pages