So when somebody comes along and proposes a fundamental change, it's worth looking into that, and understanding what's going on.
Recently, Vern Paxson, Mark Allman, Jerry Chu, and Matt Sargent have proposed RFC 6298, which, at its core, advises implementers to change the initial value of a single TCP parameter from 3 to 1:
Traditionally, TCP has used 3 seconds as the initial RTO [Bra89] [PA00]. This document calls for lowering this value to 1 second...
As Mark Allman observes in a separate document, while this might seem like a slight detail, it is in fact phenomenally important:
We note that research has shown the tension between responsiveness and correctness of TCP's RTO seems to be a fundamental tradeoff [AP99]. That is, making the RTO more aggressive (via the EWMA gains, lowering the minimum RTO, etc.) can reduce the time spent waiting on needed RTOs. However, at the same time such aggressiveness leads to more needless RTOs, as well. Therefore, being as aggressive as the guidelines sketched in the last section allow in any particular situation may not be the best course of action (e.g., because an RTO carries a requirement to slow down).
So what makes us think, 11 full years since the IETF last gave guidance on the setting of this initial parameter, that it would be best for the entire Internet if it were lowered from 3 to 1?
Well, here it is most illuminating to look at the slides that Jerry Chu presented to the IETF in the summer of 2009, documenting in great detail the findings of an immense survey that Google undertook about the performance behaviors of the modern global Internet. As Chu observes (in these notes):
There are a number of default TCP parameter settings, that, although conservative, have served us well over the years. We believe the time has come to tune some of the parameters to get more speed out of a much faster Internet than 10-20 years ago.
From our own measurement of world wide RTT distribution to Google servers we believe 3secs is too conservative, and like to propose it to be reduced to 1sec.
Why does it matter?
We have seen SYN-ACK retransmission rates upto a few percentage points to some of our servers. We also have indirect data showing the SYN (client side) retransmission to be non-negligible (~1.42% worldwide). At a rate > 1% a large RTO value can have a significant negative impact on the average end2end latency, hence the user experience. This is especially true for short connections, including much of the web traffic.
What's the downside?
For those users behind a slow (e.g., dialup, wireless) link, the RTT may still go up to > 1 sec. We believe a small amount of supriously retransmitted SYN/SYN-ACK packets should not be a cause for concern (e.g., inducing more congestion,...) In some rare case the TCP performance may be negatively affected by false congestion backoff, resulted from dupacks caused by multiple spuriously retransmitted SYN/SYN-ACK packets. We believe there are techniques to detect and mitigate these cases.
As RFC 6298 succinctly observes:
Choosing a reasonable initial RTO requires balancing two competing considerations:
1. The initial RTO should be sufficiently large to cover most of the end-to-end paths to avoid spurious retransmissions and their associated negative performance impact.
2. The initial RTO should be small enough to ensure a timely recovery from packet loss occurring before an RTT sample is taken.
I enjoy reading the IETF discussions because the team does such a wonderful job of documenting the discussions, sharing both the behind-the-scenes discussions as well as the final work, and making all of their findings open to all. I wish all technical societies did this; just imagine how much smarter we all could be if information like this was always shared so freely.
No comments:
Post a Comment