kern/39769: race condition in TCP timers

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/39769: race condition in TCP timers
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Sun, 19 Oct 2008 13:35:01 +0000 (UTC)

>Number:         39769
>Category:       kern
>Synopsis:       race condition in TCP timers
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 19 13:35:00 +0000 2008
>Originator:     Manuel Bouyer
>Release:        NetBSD 3.1.1_PATCH
>Organization:
>Environment:
System: NetBSD ftp.lip6.fr 3.1.1_PATCH NetBSD 3.1.1_PATCH (FTP) #6: Fri Oct 10 
16:47:47 CEST 2008  
bouyer@rock:/dsk/l1/misc/bouyer/tmp/amd64/obj/dsk/l1/misc/bouyer/netbsd-3-1/src/sys/arch/amd64/compile/FTP
 amd64
Architecture: amd64
Machine: amd64
>Description:
        I've got several instances of this panic:
panic: tcp_output REXMT
panic() at netbsd:panic+0x1c8
tcp_segsize() at netbsd:tcp_segsize
tcp_timer_persist() at netbsd:tcp_timer_persist+0x73
softclock() at netbsd:softclock+0x2c9
softintr_dispatch() at netbsd:softintr_dispatch+0x99
DDB lost frame for netbsd:Xsoftclock+0x2d, trying 0xffffffff8069bcd0
Xsoftclock() at netbsd:Xsoftclock+0x2d

        The tcp timers are armed/disarmed, but we don't check if the callback is
        being invoked. One of them at last can cause the panic I'm seeing in
        tcp_output():
                if (win == 0) {
                        TCP_TIMER_DISARM(tp, TCPT_REXMT);
                        tp->t_rxtshift = 0;
                        tp->snd_nxt = tp->snd_una;
                        if (TCP_TIMER_ISARMED(tp, TCPT_PERSIST) == 0)
                                tcp_setpersist(tp);
                }

        If TCPT_REXMT's callback is invoking at this point, it'll be run just
        after tcp_output() exists. tcp_timer_rexmt() will restart the
        TCPT_REXMT timer, so now we have both TCPT_REXMT and TCPT_PERSIST
        pending. When TCPT_PERSIST expires, we get the panic above.

        There are other places where we stop TCPT_REXMT, and potentially
        arm another timer in the same splsoftnet() block (e.g. tcp_input
        stops TCPT_REXMT and may call tcp_output, which may arm TCPT_PERSIST,
        tcp_sack does it too).

        There are also in tcp_output() possible paths where we disarm
        TCPT_PERSIST, and later TCPT_REXMT while TCPT_PERSIST may still be
        invoking. I think there could also cause the tcp_setpersist panic,
        as tcp_timer_persist can call tcp_setpersist().

        As TCP_TIMER_DISARM() will clear the pending and expirted states,
        I think a possible way to close this race is to check callout_expired()
        in the callout handler, and abort the handler if it's not expired.
        See attached patch.

        It looks like the same issue exists in netbsd-4 and current.

>How-To-Repeat:
        Code inspection.
        Run a system with TCP connections and be unlucky.
>Fix:
        I think this patch fixes it. I've been running it on a 3.1.1_PATCH
        server without ill effect. 

Index: tcp_timer.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/tcp_timer.c,v
retrieving revision 1.71
diff -u -r1.71 tcp_timer.c
--- tcp_timer.c 2 Mar 2005 10:20:18 -0000       1.71
+++ tcp_timer.c 10 Oct 2008 16:16:15 -0000
@@ -232,6 +232,10 @@
                splx(s);
                return;
        }
+       if (!callout_expired(&tp->t_delack_ch)) {
+               splx(s);
+               return;
+       }
 
        tp->t_flags |= TF_ACKNOW;
        (void) tcp_output(tp);
@@ -293,6 +297,10 @@
                splx(s);
                return;
        }
+       if (!callout_expired(&tp->t_timer[TCPT_REXMT])) {
+               splx(s);
+               return;
+       }
 
 #ifdef TCP_DEBUG
 #ifdef INET
@@ -453,6 +461,10 @@
                splx(s);
                return;
        }
+       if (!callout_expired(&tp->t_timer[TCPT_PERSIST])) {
+               splx(s);
+               return;
+       }
 
 #ifdef TCP_DEBUG
 #ifdef INET
@@ -520,6 +532,10 @@
                splx(s);
                return;
        }
+       if (!callout_expired(&tp->t_timer[TCPT_KEEP])) {
+               splx(s);
+               return;
+       }
 
 #ifdef TCP_DEBUG
        ostate = tp->t_state;
@@ -607,6 +623,10 @@
                splx(s);
                return;
        }
+       if (!callout_expired(&tp->t_timer[TCPT_2MSL])) {
+               splx(s);
+               return;
+       }
 
        /*
         * 2 MSL timeout went off, clear the SACK scoreboard, reset

Prev by Date: Re: kern/32718
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: Re: kern/32718
Next by Thread: kern/39770: mount(2) doc enhancements
Indexes:

Home | Main Index | Thread Index | Old Index