Journal of a Programmer: Are automated tools for testing distributed systems becoming a reality?

Friday, March 27, 2015

Are automated tools for testing distributed systems becoming a reality?

These recent publications would suggest that the time has finally come to deploy serious test tools for bullet-proofing large scale distributed systems.

Exciting!

Challenges in Designing at Scale: Formal Methods in Building Robust Distributed Systems
PlusCal and TLA+ have proven very effective at establishing and maintaining the correctness through change of the fundamental components on which DynamoDB is based.
Using TLA+ for teaching distributed systems
TLA is a tool for specifying distributed algorithms/protocols and model checking them.
My experience with using TLA+ in distributed systems class
Integrating TLA+ to the class gave students a way to get a hands-on experience in algorithms design and correctness verification.
Combining static model checking with dynamic enforcement using the Statecall Policy Language
The Statecall Policy Language (SPL) was designed to make model checking more accessible to regular programmers.
SAMC: Semantic-aware model checking for fast discovery of deep bugs in cloud systems
It was used to test 10 versions(both old and current) of Zookeeper, Hadoop/YARN, and Cassandra across 7 different protocols (leader election, atomic broadcast, cluster management, speculative execution, read/write, hinted handoff, and gossiper).
Lineage-driven Fault Injection
MOLLY doesn’t blindly explore a state space, instead MOLLY reasons backwards from a successful outcome (hopefully common!), to figure out what might have caused it to fail, and then probes paths that exercise those potential causes.
Paper review: "Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems"
Table 5 shows that almost all (98%) of the failures are guaranteed to manifest on no more than 3 nodes, and 84% will manifest on no more than 2 nodes.

Journal of a Programmer

Friday, March 27, 2015

Are automated tools for testing distributed systems becoming a reality?

No comments:

Post a Comment

About Me

Blog Archive

Pages