Subject: Re: very slow TCP connection on localhost
To: Greg Troxel <gdt@ir.bbn.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-net
Date: 09/07/2006 12:43:08
--ylS2wUBXLOxYXZFQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Thu, Sep 07, 2006 at 09:30:05AM -0400, Greg Troxel wrote:
>=20
> So I see several problems:
>=20
> * receiver window is way too small. This could be because recvspace
> is small, or because the reader doesn't take data out until there's
> enough, and 'enough' is large relative to recvspace.
There is another issue in this area I've seen. There is no way for the=20
socket layer to tell the tcp layer that it has taken data out of the=20
buffer. So if we get into strange traffic patterns like this, we can end=20
up with a closed or limited window on the sender side even though the=20
window (on the receiver side) is open. Well the crux is that it'd be open=
=20
if the receive side checked it out.
This turns into an issue at high speed, and especially for things like=20
iSCSI which are very bursty and sensitive to latency.
I don't have a solution, but we seem to be having a bit of a "things that=
=20
can go wrong on loopback" session. :-)
> * sender is not respecting the window. This seems to be the real
> issue.
Yep, and it doesn't make sense to me.
> Have you increased your buffer sizes? I'm running amanda locally
> localhost, but I'm using the real IP address (so kerberos works). I
> have with the following settings:
>=20
> net.inet.tcp.sendspace =3D 131072
> net.inet.tcp.recvspace =3D 131072
I've seen a related issue on loopback. I have an app that sends a lot of=20
unanswered data over loopback as part of one test (the data happen to be=20
iSCSI NOP-OUT packets that don't generate a response). The code was using=
=20
64k socket sizes, and got DEAD slow.
In digging around, I think the problem is with the following in=20
tcp_output.c:tcp_segsize():
/* =20
* Never send more than half a buffer full. This insures that we c=
an
* always keep 2 packets on the wire, no matter what SO_SNDBUF is, =
and=20
* therefore acks will never be delayed unless we run out of data to
* transmit.
*/ =20
if (so)
*txsegsizep =3D min(so->so_snd.sb_hiwat >> 1, *txsegsizep);
I think that should be:
*txsegsizep =3D min((so->so_snd.sb_hiwat - so->so_snd.sb_lowat)
>> 1, *txsegsizep);
As it's the difference between high and low water marks that throttle i/o.=
=20
I haven't made this change in the stack as I'm not sure if it's correct.=20
Also, I think sb_lowat is often 1, so it doesn't always matter.
Also, the code now uses 256k buffers, so my immediate pain is gone.
Take care,
Bill
--ylS2wUBXLOxYXZFQ
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)
iD8DBQFFAHZMWz+3JHUci9cRAmeyAJ9mqxCF1olNaVcCd303AD1F2M/QUACeKhtb
MOSmxQsErao3/gGtEZo7PvE=
=posE
-----END PGP SIGNATURE-----
--ylS2wUBXLOxYXZFQ--