Subject: FIN_WAIT_2's remaining in connection list
To: None <tech-net@netbsd.org>
From: Ryan Younce <ryan@manunkind.org>
List: tech-net
Date: 10/22/2000 15:58:59
I was perusing the FreeBSD bugs report page and discovered a bug marked
serious in which connections to a remote Netware web server (and then
closing the connection) would cause the connection to remain in the
FIN_WAIT_2 state for several days.  The submitter indicated that he had
run a script to open several thousand connections, and they all remained
in the list for several days before being cleared out.

I use FreeBSD as my primary desktop but have NetBSD on another box, and found
with further investigation that this "bug" (understand, I do not as of yet
know if this is a fault with BSD or Netware) existed not only on FreeBSD but
on NetBSD, OpenBSD, and Linux as well.  It does not seem to exist on Solaris
(and no, I don't know the version number for Solaris).  The others are latest
releases.

For further information on this PR, check out:

	http://www.FreeBSD.org/cgi/query-pr.cgi?pr=21791

(BTW, due to a misconfig on my side, that message was repeated several times.
The tail of the audit log is the valid patch/message).
	
My question is:  Is this normal?  Should this happen?  Is this a BSD problem
or a problem with the remote server's TCP implementation?  It does not appear
to happen connecting to the web server of www.freebsd.org or www.netbsd.org
or my personal BSD web servers.  I also have never seen this pop up with any
other servers.  Just the netware one indicated.  And as far as I can tell,
the BSD's are doing what they are supposed to do (that is, complying with
the whole TCP sequence)--it's just that it's expecting something from the
remote server (a final FIN, I think) that it just isn't getting.

Upon further investigation, I found the "culprit" to this aledged "bug" in
/sys/netinet/tcp_timer.c.  I found that adding FIN_WAIT_2 to the conditional
in the TCPT_2MSL clause of the switch in the tcp_timers function fixed this
problem on FreeBSD (I have yet to find out on NetBSD as I am recompiling a
kernel and on a 386 I'll be having a few coffees first) and would close out
the connections in FIN_WAIT_2 after 10 minutes.

For your perusal, a context diff for /sys/netinet/tcp_timer.c follows:

*** tcp_timer.c.orig    Sun Oct 22 15:43:41 2000
--- tcp_timer.c Sun Oct 22 15:43:16 2000
***************
*** 223,228 ****
--- 223,229 ----
         */
        case TCPT_2MSL:
                if (tp->t_state != TCPS_TIME_WAIT &&
+                   tp->t_state != TCPS_FIN_WAIT_2 &&
                    ((tcp_maxidle == 0) || (tp->t_idle <= tcp_maxidle)))
                        TCP_TIMER_ARM(tp, TCPT_2MSL, tcp_keepintvl);
                else

This is from the tcp_timer.c file included in the source of NetBSD 1.4.2
release.  To be complete, here's the machine info:

	NetBSD protoasted 1.4.2 NetBSD 1.4.2 (PROTOASTED) #0: Sat Oct 21
	20:04:20 EDT 2000
	root@protoasted:/usr/src/sys/arch/i386/compile/PROTOASTED i386

FreeBSD has yet to commit this so I have no idea if this is even considered
non-BSD behavior.  Would somebody with more networking knowledge than I know
one way or the other?  Like I said, this fixes it, but I don't really know
if it is *supposed* to be fixed.

Thanks.

-- 
Ryan "Cheshire" Younce / ryan@manunkind.org / http://www.manunkind.org/~ryan/

        "As in certain cults it is possible to kill a process if you
         know its true name."  -- Ken Thompson and Dennis M. Ritchie