Subject: kern/1260: sockets can get stuck in CLOSE_WAIT.
To: None <gnats-bugs@gnats.netbsd.org>
From: None <mrg@mame.mu.OZ.AU>
List: netbsd-bugs
Date: 07/23/1995 16:50:18
>Number:         1260
>Category:       kern
>Synopsis:       sockets can get stuck in CLOSE_WAIT.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jul 23 03:05:03 1995
>Last-Modified:
>Originator:     matthew green
>Organization:
bozotic softwar foundation.
>Release:        14 july 1995
>Environment:
System: NetBSD splode.mame.mu.oz.au 1.0A NetBSD 1.0A (_splode_) #240: Wed Jul 19 18:08:36 EST 1995 mrg@splode.eterna.com.au:/orb/q/build/src/sys/arch/sparc/compile/_splode_ sparc


>Description:

	here's a message from rich steven's he posted to comp.protocols.tcp-ip
	some time ago.


------- Start of forwarded message -------
Path: doc.ic.ac.uk!sunsite.doc.ic.ac.uk!uknet!EU.net!howland.reston.ans.net!gatech!nntp.msstate.edu!olivea!charnel.ecst.csuchico.edu!csusac!csus.edu!csulb.edu!nic-nac.CSU.net!news.Cerritos.edu!news.Arizona.EDU!CS.Arizona.EDU!noao!rstevens
From: rstevens@noao.edu (W. Richard Stevens)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: Strange problem with TCP/IP
Date: 10 Apr 1995 23:19:26 GMT
Organization: National Optical Astronomy Observatories, Tucson, AZ, USA
Lines: 57
Message-ID: <3mcedu$1ek@noao.edu>
References: <3lh2vp$jgo@philos.philosys.de> <3m3mdv$s1g@noao.edu>
NNTP-Posting-Host: gemini.tuc.noao.edu

> Looking back at some old tcp-ip postings, people claim that this problem
> of sockets stuck in the CLOSE_WAIT state goes back to 1983 with BSD
> implementations.  What OS do you see this under?  I've never seen a
> complete explanation of why this occurs, so I can't say whether newer
> systems fix the problem or not.

I'll follow up to my own posting.  Found the bug.  Here's what I did ...

There's only one place in tcp_input where the state is set to CLOSE_WAIT:
towards the end when a FIN is processed.  The code looks like

	if (tiflags & TH_FIN) {
                if (TCPS_HAVERCVDFIN(tp->t_state) == 0) {
                        socantrcvmore(so);
                        tp->t_flags |= TF_ACKNOW;
                        tp->rcv_nxt++;
                }
                switch (tp->t_state) {
 
                /*
                 * In SYN_RECEIVED and ESTABLISHED STATES
                 * enter the CLOSE_WAIT state.
                 */
                case TCPS_SYN_RECEIVED:
                case TCPS_ESTABLISHED:
                        tp->t_state = TCPS_CLOSE_WAIT;
                        break;
 
The question is: how can you end up here in the SYN_RCVD state, when the
ACK processing was just performed, and a received ACK in this state takes
you to ESTABLISHED, or if the ACK flag wasn't on, the segment was dropped?
Look at the gotos :-)  Turns out you can end up here from step6, which
skips the ACK processing.

The culprit is a segment with a SYN *and* a FIN *without* an ACK.  When
this is processed for a listening socket the state first goes from LISTEN
to SYN_RCVD when the SYN is processed.  Then the jump to trimthenstep6,
then to step6, bypassing the ACK processing.  Then the state is set to
CLOSE_WAIT in the code snippet above and the socket is dead.  It will
sit on the socket's so_q0 queue forever, tying up an Internet PCB and a
TCP control block.  The segment sent in response is an ACK of both the
SYN and FIN (i.e., the arriving sequence number plus 2), which makes the
other end happy.

I wrote a test program to tickle this bug and watched the packets with
tcpdump and then checked the server's end point with netstat.  4.4BSD
still shows the bug, as do older BSD implementations (SunOS 4.1.3, SVR4,
and AIX 3.2.2).  Solaris seems to handle it properly: the server stays
in the SYN_RCVD state, and sends a SYN and only ACKs the client's SYN,
not the FIN.  Since the new socket stays in the SYN_RCVD state, expecting
an ACK of its SYN, it'll timeout after a few minutes.

I think the fix is to ignore the FIN flag when in the SYN_RCVD state in
the code snippet shown above.  You definitely don't want to go into the
CLOSE_WAIT state.

	Rich Stevens
------- End of forwarded message -------

>How-To-Repeat:
>Fix:

here's my interpretation of rich's last paragraph.

Index: sys/netinet/tcp_input.c
===================================================================
RCS file: /local/cvs/src/sys/netinet/tcp_input.c,v
retrieving revision 1.1.1.5
diff -c -r1.1.1.5 tcp_input.c
*** tcp_input.c	1995/06/22 08:20:53	1.1.1.5
--- tcp_input.c	1995/07/23 06:45:52
***************
*** 1217,1227 ****
  		}
  		switch (tp->t_state) {
  
! 	 	/*
! 		 * In SYN_RECEIVED and ESTABLISHED STATES
! 		 * enter the CLOSE_WAIT state.
  		 */
  		case TCPS_SYN_RECEIVED:
  		case TCPS_ESTABLISHED:
  			tp->t_state = TCPS_CLOSE_WAIT;
  			break;
--- 1217,1232 ----
  		}
  		switch (tp->t_state) {
  
! 		/*
! 		 * To stop sockets getting stuck in CLOSE_WAIT, we ignore
! 		 * the FIN if we're in SYN_RECEIVED
  		 */
  		case TCPS_SYN_RECEIVED:
+ 			break;
+ 
+ 	 	/*
+ 		 * In ESTABLISHED STATES enter the CLOSE_WAIT state.
+ 		 */
  		case TCPS_ESTABLISHED:
  			tp->t_state = TCPS_CLOSE_WAIT;
  			break;
>Audit-Trail:
>Unformatted: