Subject: Re: weird network delays connecting to cable provider's servers...
To: Luke Mewburn <lukem@goanna.cs.rmit.edu.au>
From: Stefan Grefen <grefen@hprc.tandem.com>
List: tech-net
Date: 04/06/1999 14:11:39
In message <199904061033.UAA04279@wombat.cs.rmit.edu.au>  Luke Mewburn wrote:
> I wrote:
> > anyone seen any similar behaviour? if not, is there any tcpdumping et
> > al i can do of connections to borken vs non-borken hosts that someone
> > can analyse to help work out why.
> 
> avalon suggested I posted a tcpdump; here it is.
> 	24.192.22.163 is my cable address
> 	24.192.1.18 is the isp's ftp server
> this was a simple `ftp m3:/foo/bar/somefile'.
> 
> i let it `run' (crawl actually) for about 20 seconds before nuking the
> tcpdump.
> 
> any help?

First try increasing your recvwindow to 32k (which is what m3 uses). Try
sysctl -w net.inet.tcp.recvspace = 32768 (or add it to the routing table).
If that doesn't help try reducing it to around 2900 bytes, to try to avoid the
retransmission problem below.

you can also try to toggle:
net.inet.tcp.tcp_compat_42 
net.inet.tcp.init_win 
net.inet.tcp.win_scale
net.inet.tcp.timestamps 


Also I think we do very enthusiastic acking (I know it is recommended by the 
working group,  ..) 

Most of the options shouldn't matter ... but it may influence our statemachine
a little to do less acks ...

> 
> root@daedalus:~ 3# tcpdump -vn -i we1 host m3
> 20:30:01.834068 24.192.22.163.65470 > 24.192.1.18.4872: S 3664906231:3664906231(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,t
>*imestamp[|tcp]> (ttl 64, id 17539)
> 20:30:01.842004 24.192.1.18.4872 > 24.192.22.163.65470: S [tcp sum ok] 1151232056:1151232056(0) ack 3664906232 win 32768 <mss
>* 1460> (DF) (ttl 62, id 36279)
> 20:30:01.842530 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 1 win 17520 (ttl 64, id 17540)
> 20:30:01.843271 24.192.22.163.65471 > 24.192.1.18.21: P 84:102(18) ack 278 win 17520 [tos 0x10] (ttl 64, id 17541)
> 20:30:01.850419 24.192.1.18.21 > 24.192.22.163.65471: P 278:353(75) ack 102 win 32768 (DF) (ttl 62, id 36280)
> 20:30:01.856235 24.192.1.18.4872 > 24.192.22.163.65470: . 1:1461(1460) ack 1 win 32768 (DF) (ttl 62, id 36281)
> 20:30:01.857353 24.192.1.18.4872 > 24.192.22.163.65470: . 1461:2921(1460) ack 1 win 32768 (DF) (ttl 62, id 36282)
> 20:30:01.857990 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 2921 win 14600 [tos 0x8] (ttl 64, id 17542)
> 20:30:01.868024 24.192.1.18.4872 > 24.192.22.163.65470: . 2921:4381(1460) ack 1 win 32768 (DF) (ttl 62, id 36284)
> 20:30:01.869256 24.192.1.18.4872 > 24.192.22.163.65470: . 4381:5841(1460) ack 1 win 32768 (DF) (ttl 62, id 36285)
> 20:30:01.870255 24.192.1.18.4872 > 24.192.22.163.65470: . 5841:7301(1460) ack 1 win 32768 (DF) (ttl 62, id 36286)
> 20:30:01.870626 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 5841 win 11680 [tos 0x8] (ttl 64, id 17543)
> 20:30:01.874192 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 7301 win 17520 [tos 0x8] (ttl 64, id 17544)
> 20:30:02.728398 24.192.1.18.4872 > 24.192.22.163.65470: . 2921:4381(1460) ack 1 win 32768 (DF) (ttl 62, id 36357)

Hmm your acks were lost, this is the first retransmission. 

> 20:30:02.729084 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 7301 win 17520 [tos 0x8] (ttl 64, id 17550)

This one got through 

> 20:30:02.740613 24.192.1.18.4872 > 24.192.22.163.65470: . 7301:8761(1460) ack 1 win 32768 (DF) (ttl 62, id 36359)
> 20:30:02.741619 24.192.1.18.4872 > 24.192.22.163.65470: . 8761:10221(1460) ack 1 win 32768 (DF) (ttl 62, id 36360)
> 20:30:02.742103 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 10221 win 14600 [tos 0x8] (ttl 64, id 17552)
> 20:30:02.742638 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 10221 win 17520 [tos 0x8] (ttl 64, id 17553)

This were lost too ???

> 20:30:04.768437 24.192.1.18.4872 > 24.192.22.163.65470: . 7301:8761(1460) ack 1 win 32768 (DF) (ttl 62, id 36520)
> 20:30:04.768930 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 10221 win 17520 [tos 0x8] (ttl 64, id 17566)
> 20:30:04.777799 24.192.1.18.4872 > 24.192.22.163.65470: . 10221:11681(1460) ack 1 win 32768 (DF) (ttl 62, id 36521)
> 20:30:04.778845 24.192.1.18.4872 > 24.192.22.163.65470: . 11681:13141(1460) ack 1 win 32768 (DF) (ttl 62, id 36522)
> 20:30:04.779305 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 13141 win 14600 [tos 0x8] (ttl 64, id 17567)
> 20:30:04.779918 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 13141 win 17520 [tos 0x8] (ttl 64, id 17568)

and again ???

> 20:30:06.768540 24.192.1.18.4872 > 24.192.22.163.65470: . 10221:11681(1460) ack 1 win 32768 (DF) (ttl 62, id 36768)
> 20:30:06.769104 24.192.22.163.65470 > 24.192.1.18.4872: . [tcp sum ok] ack 13141 win 17520 [tos 0x8] (ttl 64, id 17572)

I think the TCP engine of m3 is broken and can't cope with our multiple ACk's 
coming in. As this happens with 'closed' windows, they never open on m3 due to 
the lost/discarded packets.

It would be ok to use the netbsd as a nat/router, because the problem is at TCP 
not IP-level.

Stefan


--
Stefan Grefen                                Tandem Computers Europe Inc.
grefen@hprc.tandem.com                       High Performance Research Center
 --- Hacking's just another word for nothing left to kludge. ---