Subject: Re: TCP hang on netbsd-2
To: None <tech-net@NetBSD.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-net
Date: 05/29/2005 22:30:09
On Sun, May 29, 2005 at 07:55:24PM +0200, Manuel Bouyer wrote:
> Hi,
> I'm getting hung ssh connections from a 2.0_RC2 to a 2.0_STABLE box.
> This ssh has lots of traffic on stdout.
> Here is the state on the 2.0_STABLE box:
> Proto Recv-Q Send-Q  Local Address          Foreign Address        State
> tcp        0  32144  pop.ssh                barder.53768           ESTABLISHED
> on the 2.0_RC2 box:
> tcp        0      0  barder.53768           pop.ssh                ESTABLISHED
> 
> So stdout in the ssh session is locked, there are data pending in the
> socket on the sender's side but it is never sent to the other end.
> Here's the output of a tcpdump on the sender side:
> tcpdump: listening on ex0
> (wait a bit, nothing happens)
> (hit enter in the ssh window)
> 19:38:58.772216 132.227.86.254.53768 > 132.227.63.49.22: P 2915848963:2915849011(48) ack 448885448 win 33580 <nop,nop,timestamp 42297 40418> [tos 0x10] 
> 19:38:58.966678 132.227.63.49.22 > 132.227.86.254.53768: . ack 48 win 33580 <nop,nop,timestamp 42247 42297> (DF) [tos 0x10] 
> (hit enter again, 3 times)
> 19:44:59.772273 132.227.86.254.53768 > 132.227.63.49.22: P 48:96(48) ack 1 win 33580 <nop,nop,timestamp 43019 40418> [tos 0x10] 
> 19:44:59.952820 132.227.86.254.53768 > 132.227.63.49.22: P 96:144(48) ack 1 win 33580 <nop,nop,timestamp 43019 40418> [tos 0x10] 
> 19:44:59.952921 132.227.63.49.22 > 132.227.86.254.53768: . ack 144 win 33532 <nop,nop,timestamp 42969 43019> (DF) [tos 0x10] 
> 19:45:00.084218 132.227.86.254.53768 > 132.227.63.49.22: P 144:192(48) ack 1 win 33580 <nop,nop,timestamp 43020 40418> [tos 0x10] 
> 19:45:00.210887 132.227.86.254.53768 > 132.227.63.49.22: P 192:240(48) ack 1 win 33580 <nop,nop,timestamp 43020 40418> [tos 0x10] 
> 19:45:00.210983 132.227.63.49.22 > 132.227.86.254.53768: . ack 240 win 33532 <nop,nop,timestamp 42969 43020> (DF) [tos 0x10] 
> (hit enter again, one time)
> 19:50:21.660036 132.227.86.254.53768 > 132.227.63.49.22: P 240:288(48) ack 1 win 33580 <nop,nop,timestamp 43662 40418> [tos 0x10] 
> 19:50:21.858248 132.227.63.49.22 > 132.227.86.254.53768: . ack 288 win 33580 <nop,nop,timestamp 43612 43662> (DF) [tos 0x10] 

It got unstuck 2 hours later (probably because of keepalive):
21:50:21.786483 132.227.86.254.53768 > 132.227.63.49.22: . ack 1 win 33580
21:50:21.786571 132.227.63.49.22 > 132.227.86.254.53768: . 1:1449(1448) ack 288 win 33580 <nop,nop,timestamp 58003 43662> (DF) [tos 0x10] 
21:50:21.986556 132.227.86.254.53768 > 132.227.63.49.22: . ack 1449 win 33580 <nop,nop,timestamp 58063 58003> [tos 0x10] 
21:50:21.986644 132.227.63.49.22 > 132.227.86.254.53768: . 1449:2897(1448) ack 288 win 33580 <nop,nop,timestamp 58003 43662> (DF) [tos 0x10] 
21:50:22.186662 132.227.86.254.53768 > 132.227.63.49.22: . ack 2897 win 33580 <nop,nop,timestamp 58064 58003> [tos 0x10] 
[...]

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--