It's the last day of August! How can this have happened already?
- GitHub: Scaling on Ruby, with a nomadic tech team
We host in our own datacenters. We actually have an amazing provisioning story. We basically can provision hardware like it was the cloud. We have a really small, but amazingly dedicated, physical infrastructure team, and they do phenomenal work in providing us these amazing services that we can use.
If I need a new host, I can basically tell our chatbot, Hubot, that I need X amount of host of this class on these chassis, and it will just build them and deploy back in minutes. We have this incredibly flat, flexible, but physical infrastructure. As someone who consumes that infrastructure, it’s phenomenal and to watch it working is brilliant.
- Musing on Nerd Knobs
Don’t be like that. Resist the siren song of nerd knobs. Make your designs and configurations as simple as possible. Your network (and people running it) will thank you.
- Lessons from the Cloud Bunker
A virtual or a physical machine is neither durable nor cloud-native. Neither is a container. But a cluster of Kubernetes pods is a durable and declarative abstraction. To a lesser extent, a Marathon managed cluster of containers is also durable and declarative.
- Huge Scale Deployments
What are the best practices for supporting huge-scale deployments? How do you manage fidelity of environments and processes, monitoring, blue/green deployments and more across thousands of servers?
- Name Collision Resources & Information
A name collision occurs when an attempt to resolve a name used in a private name space (e.g. under a non-delegated Top-Level Domain, or a short, unqualified name) results in a query to the public Domain Name System (DNS). When the administrative boundaries of private and public namespaces overlap, name resolution may yield unintended or harmful results.
- sophia - a modern embeddable key-value database
Sophia database and its architecture was born as a result of research and reconsideration of primary alghorithmic constraints that relate to growing popular Log-file based data structures, such as LSM-tree, B-tree, etc.
Most Log-based databases tend to organize own file storage as a collection of sorted files which are periodically merged. Thus, without applying some key filtering scheme (like Bloom-filter) in order to find a single key, database has to traverse all files that can take up to O(files_count * log(file_key_count)) in the worst case, and it's getting even worse for range scans, because Bloom-filter is incapable to operate with key order.
Sophia was designed to improve this situation by providing faster read while still getting benefit from append-only design.
- The Secret of Airbnb’s Pricing Algorithm
This June, we released our latest improvements. We started doing dynamic pricing—that is, offering new price tips daily based on changing market conditions. We tweaked our general pricing algorithms to consider some unusual, even surprising characteristics of listings. And we’ve added what we think is a unique approach to machine learning that lets our system not only learn from its own experience but also take advantage of a little human intuition when necessary.
- A lesson in BitTorrent
The same goes for BitTorrent. You can only download chunks from peers if they've got all the chunks. That's the current problem with the AshMad dump: everyone combined has only 85% of all possible chunks. The remaining 15% of the chunks haven't been uploaded to the swarm yet. Nobody has a complete copy. The original tracker is seeding at a rate of 37-kilobytes/second, handing off the next chunk to a random person in the swarm, who quickly exchanges it with everyone else in the swarm.
- Congestion Control for Large-Scale RDMA Deployments
We are deploying Remote Direct Memory Access (RDMA) technology in Microsoft’s datacenters to provide ultra-low latency and high throughput to applications, with very low CPU overhead. With RDMA, network interface cards (NICs) transfer data in and out of pre-registered memory buffers at both end hosts. The networking protocol is implemented entirely on the NICs, bypassing the host networking stack. The bypass significantly reduces CPU overhead and overall latency. To simplify design and implementation, the protocol assumes a lossless networking fabric.
- InterTubes: A Study of the US Long-haul Fiber-optic Infrastructure
We start by using fiber maps provided by tier-1 ISPs and major cable providers to construct a map of the long-haul US fiber-optic infrastructure. We also rely on previously under-utilized data sources in the form of public records from federal, state, and municipal agencies to improve the fidelity of our map. We quantify the resulting map’s connectivity characteristics and confirm a clear correspondence between long-haul fiber-optic, roadway, and railway infrastructures. Next, we examine the prevalence of high-risk links by mapping end-to-end paths resulting from large-scale traceroute campaigns onto our fiber-optic infrastructure map.
When we set out to build X-Stream and subsequent systems our aim was to really provide a great system and computation model for implementing such algorithms. The fundamental *systems* takeaway from the paper was that doing sequential scanning is a great way to deal with graphs because the gap between sequential and random access bandwidth means that you still win over sorting the data and then doing random access to fetch edges attached to active vertices.
- Epic Graph Battles of History: Chaos vs Order
Chaos is a new scalable system, due to appear at SOSP 2015. It isn't yet public, but my understanding is that it is basically a beast at sequentially streaming through edge data, across as many machines as you can swing.
Order was the subject of the recent blog post that stirred up this brouhaha: can something as simple as sorting empower the lowly laptop to compete with the scalable systems? Order isn't actually the name of a system, but it should be.
- Practical Microservice Architecture and Implementation Considerations
During the same time, the phenomenon of SOA was well popularized but there was not a clear and distinct best practice for a concrete implementation of this. Many implementations did succeed, but some were very difficult and failed. Some had services that were just too large and monolithic, while others had too many smallish services (almost microservices like) that it became difficult to achieve good performance. The concept of SOA was there, but designers and implementers failed to understand the full lifecycle of the service and its granularity and scalability impact on other services, and therefore paid a huge price during implementation.
- Why Gogo's Infuriatingly Expensive, Slow Internet Still Owns the Skies
What Gogo does in the sky is, indeed, different from what wireless companies do on terra firma. It uses an air-to-ground system that functions similarly to traditional cell service, but its radio towers point up, not down. Gogo’s towers are anywhere from 50 to 200 feet tall and can be located in rather remote locations, such as atop peaks in the Rocky Mountains or deep in the Alaskan tundra. The tower signal is received by a device on the plane’s belly that looks a bit like those antennas you used to see on stretch limos. The signal is routed to an onboard server about the size of an old-fashioned tower PC and then continues to the cabin.