Subject: Re: FIN_WAIT_2's remaining in connection list
To: None <tech-net@netbsd.org>
From: Ryan Younce <ryan@manunkind.org>
List: tech-net
Date: 10/22/2000 21:27:31
Thus spake Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>:
> > I was perusing the FreeBSD bugs report page and discovered a bug marked
> > serious in which connections to a remote Netware web server (and then
> > closing the connection) would cause the connection to remain in the
> > FIN_WAIT_2 state for several days.  The submitter indicated that he had
> > run a script to open several thousand connections, and they all remained
> > in the list for several days before being cleared out.
> 
> Be careful.  FIN_WAIT_2 is a (potentially) long-term stable state in
> TCP (when one direction has closed and the other hasn't); simply
> deleting connections which have been in FIN_WAIT_2 state for 2*MSL may
> cause data loss, because the connection is still actually open in the
> inbound direction at that point!

I have the feeling a "more" correct approach to this would require
something more than the provided patch.  What differentiates FIN_WAIT_2
states like these from those that were brought to light back in (96?) by
apache? (and I think it was then that FreeBSD 2.x specifically added 
FIN_WAIT_2 timeouts, whose value I know the TCP standardization does not 
specifically state).

Please somebody correct me if I'm wrong, as my TCP state transition knowledge
is a bit flaky I'm sure, but here's how I perceive a client-side close state
transition:

	The local-end closes its connection end, sending a segment
	containing a FIN to the remote-end.  The state is now FIN_WAIT_1.

	If the remote-end sends only an ACK back, the local-end begins
	waiting for the remote-end to close its end of the connection, which
	will send us a FIN.  The state is now FIN_WAIT_2.

	Only when the remote-end has sent this FIN (unless we manually
	intervene like with the patch) will the local-end (responding with
	an ACK as a result) alter the connection to TIME_WAIT state.

The BSD servers correctly went through the entire sequence when I tested
the method, correctly arriving at TIME_WAIT.  The server listed in the PR
keeps the connection FIN_WAIT_2 for an extraordinarily long time, so I
assume the Netware machine listed never sends a final FIN back to the
local-end.

From the best I can tell, the timer is set for twice the maximum segment
lifetime in /sys/netinet/tcp_usrreq.c.  I don't know for certain how long
this is, but I believe it is 2 minutes (from what I can tell from the 
kernel source).

I think my biggest question is:  is this a problem with *BSD/Linux, or is
this just a caveat of TCP?  It just seems like too unfortunate a consequence
for a BSD to Netware connection being closed.

-- 
            Ryan Younce |"A language that doesn't have everything is actually
     ryan@manunkind.org | easier to program in than some that do."
www.manunkind.org/~ryan |                                -- Dennis M. Ritchie