Subject: Re: kern/36097: http fetch stall in networking code
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-bugs
Date: 03/30/2007 16:50:05
The following reply was made to PR kern/36097; it has been noted by GNATS.

From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/36097: http fetch stall in networking code 
Date: Fri, 30 Mar 2007 10:47:04 -0600

 "Liam J. Foy" writes:
 > The following reply was made to PR kern/36097; it has been noted by GNATS.
 > 
 > From: "Liam J. Foy" <liamfoy@sepulcrum.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 > 	netbsd-bugs@netbsd.org, root@garbled.net
 > Subject: Re: kern/36097: http fetch stall in networking code
 > Date: Fri, 30 Mar 2007 17:36:05 +0100
 > 
 >  On 30 Mar 2007, at 16:55, Tim Rightnour wrote:
 >  
 >  > The following reply was made to PR kern/36097; it has been noted by  
 >  > GNATS.
 >  >
 >  > From: Tim Rightnour <root@garbled.net>
 >  > To: gnats-bugs@NetBSD.org
 >  > Cc: netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
 >  > 	kern-bug-people@netbsd.org
 >  > Subject: Re: kern/36097: http fetch stall in networking code
 >  > Date: Fri, 30 Mar 2007 08:50:23 -0700 (MST)
 >  >
 >  >  On 30-Mar-2007 YAMAMOTO Takashi wrote:
 >  >>  i guess it's failing to transmit any packets with sack, or  
 >  >> something like
 >  >> that.
 >  >>  are you using any hw offloading?
 >  >
 >  >  I've tested this on 4.0/i386 with a vr0 (no hardware offload that  
 >  > I know of),
 >  >  and 4.0/prep with an fxp0 with cpusaver turned on.  Both acted  
 >  > identically.
 >  >
 >  >  At the time this was reported, a number of other people also  
 >  > verified the same
 >  >  behavior on thier 4.0 machines.
 >  >
 >  
 >  You say this is fixed in current - we just need to find out who may have
 >  fixed this on purpose or by accident. Anyone know :-)?
 
 I'm not convinced this was fixed in current... On the 4.99.16 box I 
 was testing on, it worked sometimes, but not all the time...  If you 
 look at Tim's trace, you'll see:
 
 11:05:12.765774 IP muumi.lnet.lut.fi.www > polaris.64773: . 87870:89330(1460) ack 178 win 1728
 11:05:12.765780 IP polaris.64773 > muumi.lnet.lut.fi.www: . ack 89330 win 32120
 11:05:12.788019 IP muumi.lnet.lut.fi.www > polaris.64773: . 90790:92250(1460) ack 178 win 1728
 11:05:12.788029 IP polaris.64773 > muumi.lnet.lut.fi.www: . ack 89330 win 33580 <nop,nop,sack sack 1 {90790:92250} >
 
 What I'd like to know is where the 89330:90790 packet has gone.  In 
 all traces where this transfer failed it was due to a "missing 
 packet".  In many cases, it was *exactly* this packet that went 
 missing.  We correctly ack that we have received up to 89330, but why the 
 remote end doesn't retransmit the missing 89330:90790 packet like 
 it's supposed to (or why it would go missing again if retransmitted) 
 I have no idea...
 
 Later...
 
 Greg Oster