So, of course, there was another AWS outage at the end of June. Happily, I was on vacation and lying on the beach and didn't even notice it!
But others noticed it, and thought about it, and wrote up some rather interesting observations about it.
I spotted a few, and thought I'd call attention to them, in case you found them as interesting as I did:
- First, start with Amazon's own writeup
- I found it interesting to see how Netflix perceived it
- Here's some information about how Heroku perceived it
- And another post-mortem, from vendor CloudBees
In case you're wondering about the "leap second" reference in the CloudBees writeup, a decent overview is in Jonathan Corbet's Linux Weekly News article: Leaping seconds and looping servers.
Did I miss any good post-mortems? Let me know!