Thursday, November 7, 2013

TCP ex Machina

I enjoyed this short paper by Keith Winstein and Hari Balakrishnan of MIT: TCP ex Machina: Computer-Generated Congestion Control .

This paper describes a new approach to end-to-end congestion control on a multi-user network. Rather than manually formulate each endpoint’s reaction to congestion signals, as in traditional protocols, we developed a program called Remy that generates congestion-control algorithms to run at the endpoints.

In this approach, the protocol designer specifies their prior knowledge or assumptions about the network and an objective that the algorithm will try to achieve, e.g., high throughput and low queueing delay. Remy then produces a distributed algorithm—the control rules for the independent endpoints—that tries to achieve this objective.

As long-time readers of my blog know, I consider the TCP Congestion Control algorithm to be one of the "Hilbert Questions" of Computer Science: deep, fundamental, of great practical importance yet still unsolved after decades of work.

Many fine approximate algorithms have been developed over the years, yet we still remain frustratingly far from any clearly "best" algorithm.

The congestion control problem is easy to state:

Each endpoint that has pending data must decide for itself at every instant: send a packet, or don’t send a packet.

If all nodes knew in advance the network topology and capacity, and the schedule of each node’s present and future offered load, such decisions could in principle be made perfectly, to achieve a desired allocation of throughput on shared links.

In practice, however, endpoints receive observations that only hint at this information. These include feedback from receivers con- cerning the timing of packets that arrived and detection of packets that didn’t, and sometimes signals, such as ECN marks, from within the network itself. Nodes then make sending decisions based on this partial information about the network.

In this recent work, the authors have developed an algorithm for generating congestion-control algorithms, to see if a powerful computer, searching many alternatives, can uncover a superior algorithm:

Using a few CPU-weeks of computation, Remy produced several computer-generated congestion-control algorithms, which we then evaluated on a variety of simulated network conditions of varying similarity to the prior assumptions supplied at design-time.

On networks whose parameters mostly obeyed the prior knowledge supplied at design range — such as the dumbbell network with the 15 Mbps link — Remy’s end-to-end algorithms outperformed all of the human-generated congestion-control algorithms, even algorithms that receive help from network infrastructure.

But there is a twist: the new algorithm is superior, but we don't understand how it works:

Although the RemyCCs appear to work well on networks whose parameters fall within or near the limits of what they were prepared for — even beating in-network schemes at their own game and even when the design range spans an order of magnitude variation in network parameters — we do not yet understand clearly why they work, other than the observation that they seem to optimize their intended objective well.

We have attempted to make algorithms ourselves that surpass the generated RemyCCs, without success. That suggests to us that Remy may have accomplished something substantive. But digging through the dozens of rules in a RemyCC and figuring out their purpose and function is a challenging job in reverse-engineering.

RemyCCs designed for broader classes of networks will likely be even more complex, compounding the problem.

I recently whiled away an entire afternoon fiddling with the several dozen knobs on a modern Linux implementation of Congestion Control.

After some false starts, I found a promising path, and was able to speed up one of my benchmarks by a factor of 26, from 3,272 seconds down to 124 seconds.

There is tremendous opportunity for a TCP Congestion Control algorithm that can work well, as the gains in performance and efficiency are immense.

But, for now, what we have are our complex approximations, and even the most powerful computers aren't (yet) able to design a superior algorithm.

At least, not one we can understand.

No comments:

Post a Comment