Subject: Help w/ slow-downs (tcpdump + cursory analysis included)
To: None <tech-net@netbsd.org>
From: Marc Tooley <sudog@sudog.com>
List: tech-net
Date: 09/13/2000 13:37:45
Having similar weirdnesses with two completely different machines with
only one thing in common--NetBSD-1.5_ALPHA2 cvs'd to most recent.

Machine #1: Pent3-600, 3Com 3c905B-TX (ex)
Machine #2: Celeron500-ish on-board Asus Intel i82557 (fxp)

Similar behaviour:

tcp de-sync takes inordinate amount of time to recover:

11:40:34.491982 123.123.123.123.53449 > 10.0.0.4.20: S 495659696:495659696(0) ack 2683428400 win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp[|tcp]>
11:40:34.492060 10.0.0.4.20 > 123.123.123.123.53449: . ack 1 win 17520 <nop,nop,timestamp 2913495 943156> [tos 0x8]

so looks to me like tcp window scaling/etc is refused by 1.4.2
(10.0.0.4). Okay fine. Transfer continues as normal..  data, ack, data
ack..

Get to our first de-sync:

...ack 111497 win 17520 <nop,nop,timestamp 2913497 943157> [tos 0x8]
...114393:115841(1448) ack 1 win 17520 <nop,nop,timestamp 943157 0>
...ack 111497 win 17520 <nop,nop,timestamp 2913497 943157> [tos 0x8]
...115841:117289(1448) ack 1 win 17520 <nop,nop,timestamp 943157 0>
[..etc until..]
...ack 111497 win 17520 <nop,nop,timestamp 2913497 943157> [tos 0x8]
...111497:112945(1448) ack 1 win 17520 <nop,nop,timestamp 943157 0>
ack 121633 win 7384 <nop,nop,timestamp 2913497 943157> [tos 0x8]
ack 121633 win 8408 <nop,nop,timestamp 2913497 943157> [tos 0x8]
ack 121633 win 9432 <nop,nop,timestamp 2913497 943157> [tos 0x8]
ack 121633 win 10456 <nop,nop,timestamp 2913497 943157> [tos 0x8]
ack 121633 win 11480 <nop,nop,timestamp 2913497 943157> [tos 0x8]
ack 121633 win 12504 <nop,nop,timestamp 2913497 943157> [tos 0x8]
ack 121633 win 13528 <nop,nop,timestamp 2913497 943157> [tos 0x8]
ack 121633 win 14552 <nop,nop,timestamp 2913497 943157> [tos 0x8]
ack 121633 win 15576 <nop,nop,timestamp 2913497 943157> [tos 0x8]
ack 121633 win 16600 <nop,nop,timestamp 2913497 943157> [tos 0x8]

so the client (1.4.2) looks like it's trying to do some sort of window
discovery? takes about 0.001 sec to do. (11:40:35.388796 -> 11:40:35.389830)

Yet? The answer from the 1.5_ALPHA2 server waits and doesn't respond
immediately. Check this out:

11:40:35.389830 10.0.0.4.20 > 123.123.123.123.53449: . ack 121633 win 16600
so this is the final "discovery" packet from the client. (1.4.2)

11:40:35.469515 123.123.123.123.53449 > 10.0.0.4.20: . 128873:130321(1448) ack 1
and this, ~0.1 seconds later, isn't even where the client's been
bitching about. In total, there's been no packets from the 1.5_ALPHA2
server for 0.13 secs.

In fact, server sends more PAST the ack:
130321:131769(1448) ack 1 win 17520 <nop,nop,timestamp 943158 0> [tos 0x8]

before finally smartening up with:
11:40:36.978284 123.123.123.123.53449 > 10.0.0.4.20: . 121633:123081(1448) ack 1 win 17520 <nop,nop,timestamp 943160 0> [tos 0x8]

total elapsed time so far for desync recovery: 36.978284 - 35.388796 =
1.6 seconds.

Hrm.. So we think the transfer is going to continue.. don't we?
*bzzzt*

back to some sort of tcp window discovery.
11:40:36.978414 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 7384 <nop,nop,timestamp 2913500 943160> [tos 0x8]
11:40:36.978642 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 8408 <nop,nop,timestamp 2913500 943160> [tos 0x8]
11:40:36.979051 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 9432 <nop,nop,timestamp 2913500 943160> [tos 0x8]
11:40:36.979122 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 10456 <nop,nop,timestamp 2913500 943160> [tos 0x8]
11:40:36.979194 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 11480 <nop,nop,timestamp 2913500 943160> [tos 0x8]
11:40:36.979271 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 12504 <nop,nop,timestamp 2913500 943160> [tos 0x8]
11:40:36.979343 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 13528 <nop,nop,timestamp 2913500 943160> [tos 0x8]
11:40:36.979418 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 14552 <nop,nop,timestamp 2913500 943160> [tos 0x8]
11:40:36.979520 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 15576 <nop,nop,timestamp 2913500 943160> [tos 0x8]
11:40:36.979618 10.0.0.4.20 > 123.123.123.123.53449: . ack 131769 win 16600 <nop,nop,timestamp 2913500 943160> [tos 0x8]

but it hasn't changed from that ol' black magic win 17096:
server sends back:

11:40:37.060259 123.123.123.123.53449 > 10.0.0.4.20: . 131769:133217(1448) ack 1 win 17520 <nop,nop,timestamp 943161 0> [tos 0x8]

Finally, data resumes normal operations..  so, by my comprehension of
what's going on here, over a total of:
11:40:37.060259 - 11:40:35.269883 = 1.79 seconds,
131769 - 111497 = 20,272 bytes has been sent through..  maybe a bit
less..

which is 11325 == 11.06k/sec (isn't it?) =]

normal throughput on the line is 250K/sec. Unfortunately this happens
at just about every de-sync..which is on average about once every
100k, or once every six seconds.. is there something I can tweak to
prevent this sort of behaviour? It's killing overall throughput, and
destroying the major web page I've got living on that server. :) If
there's something we can do to alleviate this, it'd be a huge plus in
favour of netbsd..(probably slashdot material) and I'd rather not
downgrade to 1.4.x. I do cvs regularly to the latest 1.5 and replace
what's changed.

Finally, I've got five Linux boxes, one with a 3Com, one with Intel
etherexpress, all sitting on the same segment as the netbsd servers,
and while I get de-sync's at similar intervals, the recovery is nearly
instantaneous and the overall throughput only drops by about 40K/sec
during these periods..  I can send in a second email with a
tcpdump/analysis of the linux traffic if we need to do a side-by-side.

It's exactly the same client--a 1.4.2 machine sitting on a remote
network.

Help appreciated, pointers welcome, etc.

Sincerely,
Marc Tooley