I quite enjoyed this article by Jeff Hodges: Notes on Distributed Systems for Young Bloods.
Below is a list of some lessons I’ve learned as a distributed systems engineer that are worth being told to a new engineer. Some are subtle, and some are surprising, but none are controversial. This list is for the new distributed systems engineer to guide their thinking about the field they are taking on. It’s not comprehensive, but it’s a good beginning.
The entire article is very good, with lots of practical advice, but to get you motivated, here's Hodges's list:
- Distributed systems are different because they fail often.
- Writing robust distributed systems costs more than writing robust single-machine systems.
- Robust, open source distributed systems are much less common than robust, single-machine systems.
- Coordination is very hard.
- If you can fit your problem in memory, it’s probably trivial.
- “It’s slow” is the hardest problem you’ll ever debug.
- Implement backpressure throughout your system.
- Find ways to be partially available.
- Metrics are the only way to get your job done.
- Use percentiles, not averages.
- Learn to estimate your capacity.
- Feature flags are how infrastructure is rolled out.
- Choose id spaces wisely.
- Exploit data-locality.
- Writing cached data back to storage is bad.
- Computers can do more than you think they can.
- Use the CAP theorem to critique systems.
- Extract services.
I hadn't heard the term "Russian-doll Caching" before, so I looked it up, and it turns out it's a Ruby-on-Rails thing, and is the subject of much discussion right now: