tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: How to use TCP window autosizing?



Dave Huang <khym%azeotrope.org@localhost> writes:

> Hi, reading through
> https://wiki.netbsd.org/tutorials/tuning_netbsd_for_performance/#index3h2
> and http://proj.sunet.se/E2E/tcptune.html , my understanding is that
> the net.inet.tcp.recvbuf_auto and net.inet.tcp.sendbuf_auto
> enable/disable TCP window autosizing, and that the initial window size
> is net.inet.tcp.{recv,send}space and that it'll increase by
> net.inet.tcp.{recv,send}buf_inc up to net.inet.tcp.{recv,send}buf_max.

Yes, that's right.  It has worked for me.

> And that kern.sbmax also limits the maximum window size?

That makes sense for transmit, since unacked data is in the socket
buffer.  And for receive, since advertising a large window implies a
readiness to store data that arrives within it.


> If window autosizing is enabled, is it supposed to just work
> everywhere automatically, or does each program need to opt-in to it?
> Because I'm not seeing anything happening.

I have not actually read the code, but I am 99% sure that the window is
only increased if there are signs that it isn't big enough.  That's
complicated but the key point is that the connection has to be up
against the window, rather than congestion.

> I have a 100Mbps internet connection, and need to transfer files from
> a server on the other side of the world. Round trip ping times are in
> the 250ms range. So, according to the formula on the NetBSD wiki,
> buffer size = RTT * bandwidth = 250ms * 100Mbps = 3.125MB.

That's big, and perhaps you will fill that pipe, but I would be very
surprised if you got 100 Mbps with no loss between you and the other
end.

> I'm running NetBSD-alpha/7.0_RC2, with a kernel compiled with
> NMBCLUSTERS=16384. The sysctls mentioned in those two webpages about TCP
> tuning are set as:
>
> kern.mbuf.nmbclusters = 16384
> kern.somaxkva = 16777216
> kern.sbmax = 4194304
> net.inet.tcp.rfc1323 = 1
> net.inet.tcp.recvspace = 32768
> net.inet.tcp.sendspace = 32768
> net.inet.tcp.recvbuf_auto = 1
> net.inet.tcp.recvbuf_inc = 16384
> net.inet.tcp.recvbuf_max = 4194304
> net.inet.tcp.sendbuf_auto = 1
> net.inet.tcp.sendbuf_inc = 8192
> net.inet.tcp.sendbuf_max = 4194304
>
> A tcpdump of scp from the remote machine (running Linux) to the local
> NetBSD 7.0_RC2 shows:
>
> 00:02:32.693344 IP linux.36692 > netbsd.22: Flags [S], seq 2376757141, win 26883, options [mss 8961,sackOK,TS val 17840090 ecr 0,nop,wscale 7], length 0
> 00:02:32.693595 IP netbsd.22 > linux.36692: Flags [S.], seq 2458802765, ack 2376757142, win 32768, options [mss 1460,nop,wscale 7,nop,nop,TS val 1 ecr 17840090,sackOK,nop,nop], length 0
> 00:02:32.935663 IP linux.36692 > netbsd.22: Flags [.], ack 1, win 211, options [nop,nop,TS val 17840150 ecr 1], length 0
>
> So it looks like NetBSD starts with an initial window size of 32768,
> which I guess is expected given net.inet.tcp.recvspace = 32768? But
> when does the autosizing come into play?
>
> I let it run for 20 seconds, hoping to see the window size increase,
> but in the ACKs from NetBSD to Linux, I never see the "win" reported
> by tcpdump go above 262 (which I guess with a scaling factor of 2^7 is
> 262*128 = 33536), and the throughput is around 125kB/s (which is what
> I'd expect; 32768 bytes/250 ms = 131kB/s). There doesn't seem to be
> any packet loss. The remote side sends a burst of about 32K worth of
> data, then there's a pause of about 250ms, then another burst of 32K,
> etc.

Hmm.  I was expecting to get to this point and suspect packet loss and
tell you to run "xplot" (in pkgsrc) which lets you visualize the
ack/etc. behavior.

> 00:02:56.012112 IP linux.36692 > netbsd.22: Flags [.], seq 2183886:2185334, ack 5360, win 269, options [nop,nop,TS val 17845919 ecr 47], length 1448
> 00:02:56.012233 IP linux.36692 > netbsd.22: Flags [.], seq 2185334:2186782, ack 5360, win 269, options [nop,nop,TS val 17845919 ecr 47], length 1448
> 00:02:56.012312 IP linux.36692 > netbsd.22: Flags [P.], seq 2186782:2187774, ack 5360, win 269, options [nop,nop,TS val 17845919 ecr 47], length 992
> 00:02:56.012488 IP netbsd.22 > linux.36692: Flags [.], ack 2185334, win 152, options [nop,nop,TS val 48 ecr 17845919], length 0
> 00:02:56.012589 IP netbsd.22 > linux.36692: Flags [.], ack 2187774, win 133, options [nop,nop,TS val 48 ecr 17845919], length 0
> 00:02:56.013013 IP netbsd.22 > linux.36692: Flags [.], ack 2187774, win 261, options [nop,nop,TS val 48 ecr 17845919], length 0
> 00:02:56.022967 IP netbsd.22 > linux.36692: Flags [P.], seq 5360:5400, ack 2187774, win 262, options [nop,nop,TS val 48 ecr 17845919], length 40
>   [ I think the "win 262" in the previous packet shows that NetBSD has
>     not increased its window size over about 32K... NetBSD has
>     consumed all the data in its buffer and is waiting for more, but
>     the remote Linux is waiting to get its ACKs before sending more ]
> 00:02:56.264399 IP linux.36692 > netbsd.22: Flags [.], seq 2187774:2189222, ack 5400, win 269, options [nop,nop,TS val 17845982 ecr 48], length 1448
> 00:02:56.264490 IP linux.36692 > netbsd.22: Flags [.], seq 2189222:2190670, ack 5400, win 269, options [nop,nop,TS val 17845982 ecr 48], length 1448
>
> If I increase net.inet.tcp.recvspace to 4194304, the scp connects and
> does the ssh protocol handshake (according to "scp -v"), but the data
> transfer never actually starts... no idea what that means. If I set

I wonder if there is a 32-bit bug someplace.  I would try a number that
fits in 31 bits.

> recvspace to 3145728, scp reports about 3MB/s throughput when it first
> starts, but that gradually decreases to around 600kB/s.

Here, you should use xplot, which will plot tranmsitted packets, the ack
line, the window, and sacks.  Read the README, and use tcpdump2xplot.
It will take you an hour the first time, but then you'll wonder how
anybody can pore over numbers in tcpdump output to understand TCP
ack/window/congestion behavior.

> So, what's going on, and what can I do to get a decent transfer rate?
> If I scp from Windows to the same remote Linux box, the throughput
> slowly increases, and after 20 seconds, it's up to about 3.7MB/s, and
> it continues to increase very slowly--after 2 minutes, the throughput
> is about 4.1MB/s. The network connection is definitely capable of
> doing better than 120kB/s or 600kB/s. Of course, the hardware is
> completely different... I'm not running the Alpha edition of Windows
> NT :) But scp between the Alpha and another machine on the LAN can do
> about 2MB/s while maxing out the Alpha's CPU. If needed, I can do some
> testing on a NetBSD machine with a modern/fast amd64 CPU, but I'm
> pretty sure the Alpha should be able to do better than what I'm
> currently seeing.

My guess is that the code that's deciding to open the receive window is
not firing.   From the receiver's viewpoint, the receive window is too
small if the traffic is bunched up, which is much harder to articulate
precisely than "we are allowed to send data to the other side, the
buffer is full, and we have no unsent data",


> P.S. The NetBSD wiki mentions, "The automatic setting for sendbuf and
> recvbuf is disabled in the default installation." However, it looks
> like it was enabled by default since NetBSD 6.0. It also says, "The
> initial value for maximal send buffer and receive buffer is both 256k,
> which is very tiny," which is still the case. Is there a reason to
> keep it so tiny?

This is a balance between supporting a large number of connections and
high speed on a small number, depending on memory.  Your case is
somewhat unusual, and there are a lot of machines with only a G or so of
memory.  Arguably what would be nice is autosizing of socket bufferes
too.


Home | Main Index | Thread Index | Old Index