Journal of a Programmer: A collection of AWS post-mortems

Monday, July 16, 2012

A collection of AWS post-mortems

So, of course, there was another AWS outage at the end of June. Happily, I was on vacation and lying on the beach and didn't even notice it!

But others noticed it, and thought about it, and wrote up some rather interesting observations about it.

I spotted a few, and thought I'd call attention to them, in case you found them as interesting as I did:

First, start with Amazon's own writeup
I found it interesting to see how Netflix perceived it
Here's some information about how Heroku perceived it
And another post-mortem, from vendor CloudBees

In case you're wondering about the "leap second" reference in the CloudBees writeup, a decent overview is in Jonathan Corbet's Linux Weekly News article: Leaping seconds and looping servers.

Did I miss any good post-mortems? Let me know!

Journal of a Programmer

Monday, July 16, 2012

A collection of AWS post-mortems

No comments:

Post a Comment

About Me

Blog Archive

Pages