Subject: TCP hang on netbsd-2
To: None <tech-net@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-net
Date: 05/29/2005 19:55:24
Hi,
I'm getting hung ssh connections from a 2.0_RC2 to a 2.0_STABLE box.
This ssh has lots of traffic on stdout.
Here is the state on the 2.0_STABLE box:
Proto Recv-Q Send-Q  Local Address          Foreign Address        State
tcp        0  32144  pop.ssh                barder.53768           ESTABLISHED
on the 2.0_RC2 box:
tcp        0      0  barder.53768           pop.ssh                ESTABLISHED

So stdout in the ssh session is locked, there are data pending in the
socket on the sender's side but it is never sent to the other end.
Here's the output of a tcpdump on the sender side:
tcpdump: listening on ex0
(wait a bit, nothing happens)
(hit enter in the ssh window)
19:38:58.772216 132.227.86.254.53768 > 132.227.63.49.22: P 2915848963:2915849011(48) ack 448885448 win 33580 <nop,nop,timestamp 42297 40418> [tos 0x10] 
19:38:58.966678 132.227.63.49.22 > 132.227.86.254.53768: . ack 48 win 33580 <nop,nop,timestamp 42247 42297> (DF) [tos 0x10] 
(hit enter again, 3 times)
19:44:59.772273 132.227.86.254.53768 > 132.227.63.49.22: P 48:96(48) ack 1 win 33580 <nop,nop,timestamp 43019 40418> [tos 0x10] 
19:44:59.952820 132.227.86.254.53768 > 132.227.63.49.22: P 96:144(48) ack 1 win 33580 <nop,nop,timestamp 43019 40418> [tos 0x10] 
19:44:59.952921 132.227.63.49.22 > 132.227.86.254.53768: . ack 144 win 33532 <nop,nop,timestamp 42969 43019> (DF) [tos 0x10] 
19:45:00.084218 132.227.86.254.53768 > 132.227.63.49.22: P 144:192(48) ack 1 win 33580 <nop,nop,timestamp 43020 40418> [tos 0x10] 
19:45:00.210887 132.227.86.254.53768 > 132.227.63.49.22: P 192:240(48) ack 1 win 33580 <nop,nop,timestamp 43020 40418> [tos 0x10] 
19:45:00.210983 132.227.63.49.22 > 132.227.86.254.53768: . ack 240 win 33532 <nop,nop,timestamp 42969 43020> (DF) [tos 0x10] 
(hit enter again, one time)
19:50:21.660036 132.227.86.254.53768 > 132.227.63.49.22: P 240:288(48) ack 1 win 33580 <nop,nop,timestamp 43662 40418> [tos 0x10] 
19:50:21.858248 132.227.63.49.22 > 132.227.86.254.53768: . ack 288 win 33580 <nop,nop,timestamp 43612 43662> (DF) [tos 0x10] 

So it seems that data from client -> server is still working, but the send
queue on the server nevers drain. Does someone remember seeing/fixing this ?
I think I've also been hit by this between the same 2.0_STABLE server and a
1.6.2 client, but I didn't collect infos at this time. I also think I'm
seeing this only since I updated this server from 2.0_BETA to 2.0_STABLE:
NetBSD pop.lip6.fr 2.0_STABLE NetBSD 2.0_STABLE (GENERIC.MP) #0: Sun Apr 10 15:46:42 CEST 2005  root@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/netbsd-2-0-clean/src/sys/arch/i386/compile/GENERIC.MP i386

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--