Source-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[src/trunk]: src/sys/netinet Rewrite comments about TCP RTO calculations.



details:   https://anonhg.NetBSD.org/src/rev/8e8ac67cbe18
branches:  trunk
changeset: 764409:8e8ac67cbe18
user:      gdt <gdt%NetBSD.org@localhost>
date:      Wed Apr 20 13:35:51 2011 +0000

description:
Rewrite comments about TCP RTO calculations.

Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code.  This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.

Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.

This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.  Approved for Public
Release, Distribution Unlimited

diffstat:

 sys/netinet/tcp_input.c |  71 ++++++++++++++++++++++++++++++++++++------------
 sys/netinet/tcp_subr.c  |   6 ++--
 sys/netinet/tcp_timer.c |  12 ++++++-
 sys/netinet/tcp_timer.h |   7 ++++-
 sys/netinet/tcp_var.h   |  49 +++++++++++++++++++++++++++------
 5 files changed, 112 insertions(+), 33 deletions(-)

diffs (282 lines):

diff -r c02d4660c964 -r 8e8ac67cbe18 sys/netinet/tcp_input.c
--- a/sys/netinet/tcp_input.c   Wed Apr 20 10:40:14 2011 +0000
+++ b/sys/netinet/tcp_input.c   Wed Apr 20 13:35:51 2011 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: tcp_input.c,v 1.308 2011/04/14 15:48:48 yamt Exp $     */
+/*     $NetBSD: tcp_input.c,v 1.309 2011/04/20 13:35:51 gdt Exp $      */
 
 /*
  * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
@@ -145,7 +145,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: tcp_input.c,v 1.308 2011/04/14 15:48:48 yamt Exp $");
+__KERNEL_RCSID(0, "$NetBSD: tcp_input.c,v 1.309 2011/04/20 13:35:51 gdt Exp $");
 
 #include "opt_inet.h"
 #include "opt_ipsec.h"
@@ -1687,6 +1687,18 @@
                 * the current time, or is extremely old, fall back to non-1323
                 * RTT calculation.  Since ts_rtt is unsigned, we can test both
                 * at the same time.
+                *
+                * Note that ts_rtt is in units of slow ticks (500
+                * ms).  Since most earthbound RTTs are < 500 ms,
+                * observed values will have large quantization noise.
+                * Our smoothed RTT is then the fraction of observed
+                * samples that are 1 tick instead of 0 (times 500
+                * ms).
+                *
+                * ts_rtt is increased by 1 to denote a valid sample,
+                * with 0 indicating an invalid measurement.  This
+                * extra 1 must be removed when ts_rtt is used, or
+                * else an an erroneous extra 500 ms will result.
                 */
                ts_rtt = TCP_TIMESTAMP(tp) - opti.ts_ecr + 1;
                if (ts_rtt > TCP_PAWS_IDLE)
@@ -3223,6 +3235,9 @@
 /*
  * Collect new round-trip time estimate
  * and update averages and current timeout.
+ *
+ * rtt is in units of slow ticks (typically 500 ms) -- essentially the
+ * difference of two timestamps.
  */
 void
 tcp_xmit_timer(struct tcpcb *tp, uint32_t rtt)
@@ -3232,35 +3247,55 @@
        TCP_STATINC(TCP_STAT_RTTUPDATED);
        if (tp->t_srtt != 0) {
                /*
-                * srtt is stored as fixed point with 3 bits after the
-                * binary point (i.e., scaled by 8).  The following magic
-                * is equivalent to the smoothing algorithm in rfc793 with
-                * an alpha of .875 (srtt = rtt/8 + srtt*7/8 in fixed
-                * point).  Adjust rtt to origin 0.
+                * Compute the amount to add to srtt for smoothing,
+                * *alpha, or 2^(-TCP_RTT_SHIFT).  Because
+                * srtt is stored in 1/32 slow ticks, we conceptually
+                * shift left 5 bits, subtract srtt to get the
+                * diference, and then shift right by TCP_RTT_SHIFT
+                * (3) to obtain 1/8 of the difference.
                 */
                delta = (rtt << 2) - (tp->t_srtt >> TCP_RTT_SHIFT);
+               /* 
+                * This can never happen, because delta's lowest
+                * possible value is 1/8 of t_srtt.  But if it does,
+                * set srtt to some reasonable value, here chosen
+                * as 1/8 tick.
+                */
                if ((tp->t_srtt += delta) <= 0)
                        tp->t_srtt = 1 << 2;
                /*
-                * We accumulate a smoothed rtt variance (actually, a
-                * smoothed mean difference), then set the retransmit
-                * timer to smoothed rtt + 4 times the smoothed variance.
-                * rttvar is stored as fixed point with 2 bits after the
-                * binary point (scaled by 4).  The following is
-                * equivalent to rfc793 smoothing with an alpha of .75
-                * (rttvar = rttvar*3/4 + |delta| / 4).  This replaces
-                * rfc793's wired-in beta.
+                * RFC2988 requires that rttvar be updated first.
+                * This code is compliant because "delta" is the old
+                * srtt minus the new observation (scaled).
+                *
+                * RFC2988 says:
+                *   rttvar = (1-beta) * rttvar + beta * |srtt-observed|
+                *
+                * delta is in units of 1/32 ticks, and has then been
+                * divided by 8.  This is equivalent to being in 1/16s
+                * units and divided by 4.  Subtract from it 1/4 of
+                * the existing rttvar to form the (signed) amount to
+                * adjust.
                 */
                if (delta < 0)
                        delta = -delta;
                delta -= (tp->t_rttvar >> TCP_RTTVAR_SHIFT);
+               /*
+                * As with srtt, this should never happen.  There is
+                * no support in RFC2988 for this operation.  But 1/4s
+                * as rttvar when faced with someting arguably wrong
+                * is ok.
+                */
                if ((tp->t_rttvar += delta) <= 0)
                        tp->t_rttvar = 1 << 2;
        } else {
                /*
-                * No rtt measurement yet - use the unsmoothed rtt.
-                * Set the variance to half the rtt (so our first
-                * retransmit happens at 3*rtt).
+                * This is the first measurement.  Per RFC2988, 2.2,
+                * set rtt=R and srtt=R/2.
+                * For srtt, storage representation is 1/32 ticks,
+                * so shift left by 5.
+                * For rttvar, storage representatnio is 1/16 ticks,
+                * So shift left by 4, but then right by 1 to halve.
                 */
                tp->t_srtt = rtt << (TCP_RTT_SHIFT + 2);
                tp->t_rttvar = rtt << (TCP_RTTVAR_SHIFT + 2 - 1);
diff -r c02d4660c964 -r 8e8ac67cbe18 sys/netinet/tcp_subr.c
--- a/sys/netinet/tcp_subr.c    Wed Apr 20 10:40:14 2011 +0000
+++ b/sys/netinet/tcp_subr.c    Wed Apr 20 13:35:51 2011 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: tcp_subr.c,v 1.238 2009/09/16 15:23:05 pooka Exp $     */
+/*     $NetBSD: tcp_subr.c,v 1.239 2011/04/20 13:35:51 gdt Exp $       */
 
 /*
  * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
@@ -91,7 +91,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: tcp_subr.c,v 1.238 2009/09/16 15:23:05 pooka Exp $");
+__KERNEL_RCSID(0, "$NetBSD: tcp_subr.c,v 1.239 2011/04/20 13:35:51 gdt Exp $");
 
 #include "opt_inet.h"
 #include "opt_ipsec.h"
@@ -164,7 +164,7 @@
 
 
 struct inpcbtable tcbtable;    /* head of queue of active tcpcb's */
-u_int32_t tcp_now;             /* for RFC 1323 timestamps */
+u_int32_t tcp_now;             /* slow ticks, for RFC 1323 timestamps */
 
 percpu_t *tcpstat_percpu;
 
diff -r c02d4660c964 -r 8e8ac67cbe18 sys/netinet/tcp_timer.c
--- a/sys/netinet/tcp_timer.c   Wed Apr 20 10:40:14 2011 +0000
+++ b/sys/netinet/tcp_timer.c   Wed Apr 20 13:35:51 2011 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: tcp_timer.c,v 1.84 2008/11/10 01:06:43 uebayasi Exp $  */
+/*     $NetBSD: tcp_timer.c,v 1.85 2011/04/20 13:35:52 gdt Exp $       */
 
 /*
  * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
@@ -93,7 +93,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: tcp_timer.c,v 1.84 2008/11/10 01:06:43 uebayasi Exp $");
+__KERNEL_RCSID(0, "$NetBSD: tcp_timer.c,v 1.85 2011/04/20 13:35:52 gdt Exp $");
 
 #include "opt_inet.h"
 #include "opt_tcp_debug.h"
@@ -387,6 +387,14 @@
                if (tp->t_in6pcb)
                        in6_losing(tp->t_in6pcb);
 #endif
+               /*
+                * This operation is not described in RFC2988.  The
+                * point is to keep srtt+4*rttvar constant, so we
+                * should shift right 2 bits to divide by 4, and then
+                * shift right one bit because the storage
+                * representation of rttvar is 1/16s vs 1/32s for
+                * srtt.
+                */
                tp->t_rttvar += (tp->t_srtt >> TCP_RTT_SHIFT);
                tp->t_srtt = 0;
        }
diff -r c02d4660c964 -r 8e8ac67cbe18 sys/netinet/tcp_timer.h
--- a/sys/netinet/tcp_timer.h   Wed Apr 20 10:40:14 2011 +0000
+++ b/sys/netinet/tcp_timer.h   Wed Apr 20 13:35:51 2011 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: tcp_timer.h,v 1.26 2008/04/28 20:24:09 martin Exp $    */
+/*     $NetBSD: tcp_timer.h,v 1.27 2011/04/20 13:35:52 gdt Exp $       */
 
 /*-
  * Copyright (c) 2001, 2005 The NetBSD Foundation, Inc.
@@ -114,6 +114,7 @@
 
 /*
  * Time constants.
+ * All TCPTV_* constants are in units of slow ticks (typically 500 ms).
  */
 #define        TCPTV_MSL       ( 30*PR_SLOWHZ)         /* max seg lifetime (hah!) */
 #define        TCPTV_SRTTBASE  0                       /* base roundtrip time;
@@ -149,6 +150,10 @@
        callout_setfunc(&(tp)->t_timer[(timer)],                        \
            tcp_timer_funcs[(timer)], (tp))
 
+/*
+ * nticks is given in units of slow timeouts,
+ * typically 500 ms (with PR_SLOWHZ at 2).
+ */
 #define        TCP_TIMER_ARM(tp, timer, nticks)                                \
        callout_schedule(&(tp)->t_timer[(timer)],                       \
            (nticks) * (hz / PR_SLOWHZ))
diff -r c02d4660c964 -r 8e8ac67cbe18 sys/netinet/tcp_var.h
--- a/sys/netinet/tcp_var.h     Wed Apr 20 10:40:14 2011 +0000
+++ b/sys/netinet/tcp_var.h     Wed Apr 20 13:35:51 2011 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: tcp_var.h,v 1.163 2011/04/14 15:55:46 yamt Exp $       */
+/*     $NetBSD: tcp_var.h,v 1.164 2011/04/20 13:35:52 gdt Exp $        */
 
 /*
  * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
@@ -546,19 +546,45 @@
 #endif
 
 /*
- * The smoothed round-trip time and estimated variance
- * are stored as fixed point numbers scaled by the values below.
- * For convenience, these scales are also used in smoothing the average
- * (smoothed = (1/scale)sample + ((scale-1)/scale)smoothed).
- * With these scales, srtt has 3 bits to the right of the binary point,
- * and thus an "ALPHA" of 0.875.  rttvar has 2 bits to the right of the
- * binary point, and is smoothed with an ALPHA of 0.75.
+ * See RFC2988 for a discussion of RTO calculation; comments assume
+ * familiarity with that document.
+ *
+ * The smoothed round-trip time and estimated variance are stored as
+ * fixed point numbers.  Historically, srtt was scaled by
+ * TCP_RTT_SHIFT bits, and rttvar by TCP_RTTVAR_SHIFT bits.  Because
+ * the values coincide with the alpha and beta parameters suggested
+ * for RTO calculation (1/8 for srtt, 1/4 for rttvar), the combination
+ * of computing 1/8 of the new value and transforming it to the
+ * fixed-point representation required zero instructions.  However,
+ * the storage representations no longer coincide with the alpha/beta
+ * shifts; instead, more fractional bits are present.
+ *
+ * The storage representation of srtt is 1/32 slow ticks, or 1/64 s.
+ * (The assumption that a slow tick is 500 ms should not be present in
+ * the code.)
+ *
+ * The storage representation of rttvar is 1/16 slow ticks, or 1/32 s.
+ * There may be some confusion about this in the code.
+ *
+ * For historical reasons, these scales are also used in smoothing the
+ * average (smoothed = (1/scale)sample + ((scale-1)/scale)smoothed).
+ * This results in alpha of 0.125 and beta of 0.25, following RFC2988
+ * section 2.3
  */
 #define        TCP_RTT_SHIFT           3       /* shift for srtt; 3 bits frac. */
 #define        TCP_RTTVAR_SHIFT        2       /* multiplier for rttvar; 2 bits */
 
 /*
- * The initial retransmission should happen at rtt + 4 * rttvar.
+ * Compute TCP retransmission timer, following RFC2988.
+ * This macro returns a value in slow timeout ticks.
+ *
+ * Section 2.2 requires that the RTO value be
+ *  srtt + max(G, 4*RTTVAR)
+ * where G is the clock granularity.
+ *
+ * This comment has not necessarily been updated for the new storage
+ * representation:
+ *
  * Because of the way we do the smoothing, srtt and rttvar
  * will each average +1/2 tick of bias.  When we compute
  * the retransmit timer, we want 1/2 tick of rounding and
@@ -569,6 +595,11 @@
  * the minimum feasible timer (which is 2 ticks).
  * This macro assumes that the value of 1<<TCP_RTTVAR_SHIFT
  * is the same as the multiplier for rttvar.
+ *
+ * This macro appears to be wrong; it should be checking rttvar*4 in
+ * ticks and making sure we use 1 instead if rttvar*4 rounds to 0.  It
+ * appears to be treating srtt as being in the old storage
+ * representation, resulting in a factor of 4 extra.
  */
 #define        TCP_REXMTVAL(tp) \
        ((((tp)->t_srtt >> TCP_RTT_SHIFT) + (tp)->t_rttvar) >> 2)



Home | Main Index | Thread Index | Old Index