Hey, isn't it the holiday season? Aren't things supposed to be quieting down around now?
- Infrastructure Update: Pushing the edges of our global performance
It can take up to 180 milliseconds for data traveling by undersea cables at nearly the speed of light to cross the Pacific Ocean. Data traveling across the Atlantic can take up to 90 milliseconds. This travel time is compounded by the way TCP works. To establish a reliable connection for uploads, the client initiates what’s called a slow start. It sends a few packets of data, then waits for an ACK (or acknowledgement), confirming that the data has been received. The client will then send a larger group of packets and await confirmation, repeating this process until ultimately transmitting data at the user’s full available link capacity. Given the limitations we encounter here—the distance across the Pacific Ocean, and the speed of light—there are only so many optimizations we can make before physics stands in the way.
- Slicer: Auto-sharding for datacenter applications
What exactly is Slicer then? It has two key components: a data plane that acts as an affinity-aware load balancer, with affinity managed based on application-specified keys; and a control plane that monitors load and instructs applications processes as to which keys they should be serving at any one point in time. In this way, the decisions regarding how to balance keys across application instances can be outsourced to the Slicer service rather than building this logic over and over again for each individual back-end service. Slicer is focused exclusively on the problem of balancing load across a given set of backend tasks
- QCon NewYork 2016: The Verification of a Distributed System
Distributed Systems are difficult to build and test for two main reasons: partial failure & asynchrony. These two realities of distributed systems must be addressed to create a correct system, and often times the resulting systems have a high degree of complexity. Because of this complexity, testing and verifying these systems is critically important. In this talk we will discuss strategies for proving a system is correct, like formal methods, and less strenuous methods of testing which can help increase our confidence that our systems are doing the right thing.(Don't miss the awesome list of reference material!)
- An Approach to Designing Distributed, Fault-Tolerant, Horizontally Scalable Event Scheduler
For the processing part, a master is elected among the cluster members. Zookeeper could be used for leader/master election, but since BigBen already uses Hazelcast, we used the distributed lock feature to implement a Cluster Singleton. The master then schedules the next bucket and reads the event counts. Knowing the event count and shard size, it can calculate very easily how many shards are in total. The master then creates pairs of (bucket, shard_index) and divides them equally among the cluster members, including itself. In case of unequal division, the master tries to take the minimum load on itself.
- Hazelcast is the leading open source in-memory data grid.
If you have programmed applications in Java, you have probably worked with concurrency primitives like the synchronized statement (the intrinsic lock) or the concurrency library that was introduced in Java 5 under java.util.concurrent, such as Executor, Lock and AtomicReference.
This concurrency functionality is useful if you want to write a Java application that uses multiple threads, but the focus here is to provide synchronization in a single JVM and not distributed synchronization over multiple JVMs. Luckily, Hazelcast provides support for various distributed synchronization primitives such as the ILock, IAtomicLong, etc. Apart from making synchronization between different JVMs possible, these primitives also support high availability: if one machine fails, the primitive remains usable for other JVMs.
- New – AWS Step Functions – Build Distributed Applications Using Visual Workflows
Today we are launching AWS Step Functions to allow you to do exactly what I described above. You can coordinate the components of your application as series of steps in a visual workflow. You create state machines in the Step Functions Console to specify and execute the steps of your application at scale.
Each state machine defines a set of states and the transitions between them. States can be activated sequentially or in parallel; Step Functions will make sure that all parallel states run to completion before moving forward. States perform work, make decisions, and control progress through the state machine.
- Performance improvements in bcachefs-testing
btree nodes are log structured, with multiple sorted sets of keys. In memory, we sort/compact as needed so that we never have more than three different sets of keys: the lookup and iterator code has to search through and maintain pointers into each sorted set of keys, so we don't want to deal with too many. Having multiple sorted sets of keys ends up being a performance win, since the result is that only the newest and smallest is being modified at any given time, and the rest are constant - we can construct lookup tables for the constant sets of keys that are drastically more efficient for lookup, but wouldn't be possible to update without regenerating the entire lookup table.
- Probabilistic Data Structure Showdown: Cuckoo Filters vs. Bloom Filters
Probabilistic data structures store data compactly with low memory and provide approximate answers to queries about stored data. They are designed to answer queries in a space-efficient manner, which can mean sacrificing accuracy.
Like Bloom filters, the Cuckoo filter is a probabilistic data structure for testing set membership. The ‘Cuckoo’ in the name comes from the filter’s use of the Cuckoo hashtable as its underlying storage structure. The Cuckoo hashtable is named after the cuckoo bird becauses it leverages the brood parasitic behavior of the bird in its design. Cuckoo birds are known to lay eggs in the nests of other birds, and once an egg hatches, the young bird typically ejects the host’s eggs from the nest. A Cuckoo hash table employs similar behavior in dealing with items to be inserted into occupied 'buckets’ in a Cuckoo hash table.
- Building robust software with rigorous design documents
So what, exactly, goes into a design document for a problem domain? What makes these docs so detailed and rigorous? I believe that the hallmark of these designs is an extremely thorough assessment of risk.
As the owner of a problem domain, you need to look into the future and anticipate everything that could go wrong. Your goal is to identify all of the possible problems that will need to be addressed by your design and implementation. You investigate each of these problems deeply enough to provide a useful explanation of what they mean in your design document. Then you rank these problems as risks based on a combination of severity (low, medium, high) and likelihood (doubtful, potential, definite).
- How Google Is Challenging AWS
Still, for all the success Microsoft has had with Office 365, the real giant of cloud computing — which is to say the future of enterprise computing — is, as is so often the case, a company no one saw coming: the same year Google decided to take on Microsoft Amazon launched Amazon Web Services. What makes AWS so compelling is the way that it reflects Amazon itself: it is built for scale and with clearly-defined and hardened interfaces. Customers — first Amazon but also companies around the world — access “primitives” that can be mixed-and-matched to build a more efficient, scalable, and secure back-end than nearly any company could build on its own.
Where Kubernetes differs from Borg is that it is fully portable: it runs on AWS, it runs on Azure, it runs on the Google Cloud Platform, it runs on on-premise infrastructure, you can even run it in your house. More relevantly to this article, it is the perfect antidote to AWS’ ten year head-start in infrastructure-as-a-service: while Google has made great strides in its own infrastructure offerings, the potential impact of Kubernetes specifically and container-based development broadly is to make irrelevant which infrastructure provider you use. No wonder it is one of the fastest growing open-source projects of all time: there is no lock-in.
- 52 things I learned in 2016