Subject: Re: very slow TCP connection on localhost
To: Greg Troxel <firstname.lastname@example.org>
From: Bill Studenmund <email@example.com>
Date: 09/07/2006 12:43:08
Content-Type: text/plain; charset=us-ascii
On Thu, Sep 07, 2006 at 09:30:05AM -0400, Greg Troxel wrote:
> So I see several problems:
> * receiver window is way too small. This could be because recvspace
> is small, or because the reader doesn't take data out until there's
> enough, and 'enough' is large relative to recvspace.
There is another issue in this area I've seen. There is no way for the=20
socket layer to tell the tcp layer that it has taken data out of the=20
buffer. So if we get into strange traffic patterns like this, we can end=20
up with a closed or limited window on the sender side even though the=20
window (on the receiver side) is open. Well the crux is that it'd be open=
if the receive side checked it out.
This turns into an issue at high speed, and especially for things like=20
iSCSI which are very bursty and sensitive to latency.
I don't have a solution, but we seem to be having a bit of a "things that=
can go wrong on loopback" session. :-)
> * sender is not respecting the window. This seems to be the real
Yep, and it doesn't make sense to me.
> Have you increased your buffer sizes? I'm running amanda locally
> localhost, but I'm using the real IP address (so kerberos works). I
> have with the following settings:
> net.inet.tcp.sendspace =3D 131072
> net.inet.tcp.recvspace =3D 131072
I've seen a related issue on loopback. I have an app that sends a lot of=20
unanswered data over loopback as part of one test (the data happen to be=20
iSCSI NOP-OUT packets that don't generate a response). The code was using=
64k socket sizes, and got DEAD slow.
In digging around, I think the problem is with the following in=20
* Never send more than half a buffer full. This insures that we c=
* always keep 2 packets on the wire, no matter what SO_SNDBUF is, =
* therefore acks will never be delayed unless we run out of data to
*txsegsizep =3D min(so->so_snd.sb_hiwat >> 1, *txsegsizep);
I think that should be:
*txsegsizep =3D min((so->so_snd.sb_hiwat - so->so_snd.sb_lowat)
>> 1, *txsegsizep);
As it's the difference between high and low water marks that throttle i/o.=
I haven't made this change in the stack as I'm not sure if it's correct.=20
Also, I think sb_lowat is often 1, so it doesn't always matter.
Also, the code now uses 256k buffers, so my immediate pain is gone.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)
-----END PGP SIGNATURE-----