Subject: Unexplained RSTs
To: None <tech-net@NetBSD.ORG>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-net
Date: 08/08/1997 09:31:54
I was testing a potential new netlink, and was trying to transfer big
chunks of data between home (NetBSD/sun3) and machines on the other
end.  I can do this in either direction when the other end is
NetBSD/sparc, but talking to a SunOS machine I get an unexplained
"Connection reset by peer" out of the SunOS end.  I snooped on the
conversation (tcpdump on an uninvolved machine on the same physical net
as the NetBSD end) and here's what I see.  209.89.25.2 is the
NetBSD/sun3 machine at home; 132.206.4.24 is the SunOS machine on the
other end of the link.

08:52:18.353013 209.89.25.2.2070 > 132.206.4.24.18881: S 1995456000:1995456000(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp 4324666 0>
08:52:18.497860 132.206.4.24.18881 > 209.89.25.2.2070: S 839286019:839286019(0) ack 1995456001 win 4096
08:52:18.499385 209.89.25.2.2070 > 132.206.4.24.18881: . ack 1 win 16384
08:52:18.563615 209.89.25.2.2070 > 132.206.4.24.18881: . 1:513(512) ack 1 win 16384
08:52:18.565323 209.89.25.2.2070 > 132.206.4.24.18881: . 513:1025(512) ack 1 win 16384
08:52:19.216134 209.89.25.2.2070 > 132.206.4.24.18881: . 1:513(512) ack 1 win 16384
08:52:19.738054 132.206.4.24.18881 > 209.89.25.2.2070: . ack 513 win 4096
08:52:19.740499 209.89.25.2.2070 > 132.206.4.24.18881: . 513:1025(512) ack 1 win 16384
08:52:19.741835 209.89.25.2.2070 > 132.206.4.24.18881: . 1025:1537(512) ack 1 win 16384
08:52:20.722652 209.89.25.2.2070 > 132.206.4.24.18881: . 513:1025(512) ack 1 win 16384
08:52:21.127797 132.206.4.24.18881 > 209.89.25.2.2070: . ack 1025 win 4096
08:52:21.130502 209.89.25.2.2070 > 132.206.4.24.18881: . 1025:1537(512) ack 1 win 16384
08:52:21.131872 209.89.25.2.2070 > 132.206.4.24.18881: . 1537:2049(512) ack 1 win 16384
08:52:21.736311 209.89.25.2.2070 > 132.206.4.24.18881: . 1025:1537(512) ack 1 win 16384
08:52:22.127888 132.206.4.24.18881 > 209.89.25.2.2070: . ack 1537 win 4096
08:52:22.130403 209.89.25.2.2070 > 132.206.4.24.18881: . 1537:2049(512) ack 1 win 16384
08:52:22.131772 209.89.25.2.2070 > 132.206.4.24.18881: . 2049:2561(512) ack 1 win 16384
08:52:22.738833 209.89.25.2.2070 > 132.206.4.24.18881: . 1537:2049(512) ack 1 win 16384
08:52:23.127792 132.206.4.24.18881 > 209.89.25.2.2070: . ack 2049 win 4096
08:52:23.130429 209.89.25.2.2070 > 132.206.4.24.18881: . 2049:2561(512) ack 1 win 16384
08:52:23.131820 209.89.25.2.2070 > 132.206.4.24.18881: . 2561:3073(512) ack 1 win 16384
08:52:23.736614 209.89.25.2.2070 > 132.206.4.24.18881: . 2049:2561(512) ack 1 win 16384
08:52:24.127782 132.206.4.24.18881 > 209.89.25.2.2070: . ack 2561 win 4096
08:52:24.130342 209.89.25.2.2070 > 132.206.4.24.18881: . 2561:3073(512) ack 1 win 16384
08:52:24.131693 209.89.25.2.2070 > 132.206.4.24.18881: . 3073:3585(512) ack 1 win 16384
08:52:24.736606 209.89.25.2.2070 > 132.206.4.24.18881: . 2561:3073(512) ack 1 win 16384
08:52:25.127996 132.206.4.24.18881 > 209.89.25.2.2070: . ack 3073 win 4096
08:52:25.130548 209.89.25.2.2070 > 132.206.4.24.18881: . 3073:3585(512) ack 1 win 16384
08:52:25.131894 209.89.25.2.2070 > 132.206.4.24.18881: . 3585:4097(512) ack 1 win 16384
08:52:25.736616 209.89.25.2.2070 > 132.206.4.24.18881: . 3073:3585(512) ack 1 win 16384
08:52:26.128048 132.206.4.24.18881 > 209.89.25.2.2070: . ack 3585 win 4096
08:52:26.130587 209.89.25.2.2070 > 132.206.4.24.18881: . 3585:4097(512) ack 1 win 16384
08:52:26.131970 209.89.25.2.2070 > 132.206.4.24.18881: . 4097:4609(512) ack 1 win 16384
08:52:26.736643 209.89.25.2.2070 > 132.206.4.24.18881: . 3585:4097(512) ack 1 win 16384
08:52:27.127975 132.206.4.24.18881 > 209.89.25.2.2070: . ack 4097 win 4096
08:52:27.130563 209.89.25.2.2070 > 132.206.4.24.18881: . 4097:4609(512) ack 1 win 16384
08:52:27.131914 209.89.25.2.2070 > 132.206.4.24.18881: . 4609:5121(512) ack 1 win 16384
08:52:27.736649 209.89.25.2.2070 > 132.206.4.24.18881: . 4097:4609(512) ack 1 win 16384
08:52:28.127966 132.206.4.24.18881 > 209.89.25.2.2070: . ack 4609 win 4096
08:52:28.130506 209.89.25.2.2070 > 132.206.4.24.18881: . 4609:5121(512) ack 1 win 16384
08:52:28.131865 209.89.25.2.2070 > 132.206.4.24.18881: . 5121:5633(512) ack 1 win 16384
08:52:28.736644 209.89.25.2.2070 > 132.206.4.24.18881: . 4609:5121(512) ack 1 win 16384
08:52:29.128011 132.206.4.24.18881 > 209.89.25.2.2070: . ack 5121 win 4096
08:52:29.130590 209.89.25.2.2070 > 132.206.4.24.18881: . 5121:5633(512) ack 1 win 16384
08:52:29.131941 209.89.25.2.2070 > 132.206.4.24.18881: . 5633:6145(512) ack 1 win 16384
08:52:29.736667 209.89.25.2.2070 > 132.206.4.24.18881: . 5121:5633(512) ack 1 win 16384
08:52:30.128175 132.206.4.24.18881 > 209.89.25.2.2070: . ack 5633 win 4096
08:52:30.130717 209.89.25.2.2070 > 132.206.4.24.18881: . 5633:6145(512) ack 1 win 16384
08:52:30.132087 209.89.25.2.2070 > 132.206.4.24.18881: . 6145:6657(512) ack 1 win 16384
08:52:30.736675 209.89.25.2.2070 > 132.206.4.24.18881: . 5633:6145(512) ack 1 win 16384
08:52:31.128069 132.206.4.24.18881 > 209.89.25.2.2070: . ack 6145 win 4096
08:52:31.130658 209.89.25.2.2070 > 132.206.4.24.18881: . 6145:6657(512) ack 1 win 16384
08:52:31.132017 209.89.25.2.2070 > 132.206.4.24.18881: . 6657:7169(512) ack 1 win 16384
08:52:31.735597 209.89.25.2.2070 > 132.206.4.24.18881: R 7169:7169(0) ack 1 win 16384

Anyone have any idea why (a) NetBSD insists on sending back-to-back
pairs of packets, (b) what's going on that seems to consistently lose
said back-to-back packets, and, most especially, (c) why the NetBSD end
decides to RST the connection despite data flowing and being acked
(albeit somewhat inefficiently)?  Note also that the _entire exchange_
took only about 13 seconds.

I have the full trace including all packet contents, if anyone thinks
it might help diagnose the problem.

The NetBSD/sun3 machine is running slightly-post-1.2 (I don't normally
try to track -current on a Sun-3/260; those of you who have tried
building the world on a 68020/20 will understand why).  The SunOS end
is 4.1.something, but AFAICT from the tcpdump, it is not at fault.

The apparent back-to-back packet loss _could_ be related to the network
setup; the "new netlink" is a dialup PPP link out of a low-end 386, and
it may have a sufficiently stupid Ethernet to be incapable of handling
back-to-back packets, though it didn't gripe visibly during the above
experiment.  I'm not concerned about such problems on the SunOS box or
other gateways, because the PPP link will produce a relatively high
inter-packet delay which will affect all later hops.  (I'm also not
sure just how back-to-back the packets truly are; they seem to be about
.0013 second apart.  At 10 megabits, that's 13000 bit times, or 1625
bytes.  Even with TCP, IP, and Ethernet overhead, this is at least half
again what a 512-byte packet should take, probably more like twice.)

Is this a known problem in the 1.2 stack, and I should just bite the
bullet and go to a more recent kernel?  Or is it something deeper?

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B