Subject: Re: very slow TCP connection on localhost
To: Greg Troxel <gdt@ir.bbn.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-net
Date: 09/07/2006 12:43:08
--ylS2wUBXLOxYXZFQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Sep 07, 2006 at 09:30:05AM -0400, Greg Troxel wrote:
>=20
> So I see several problems:
>=20
> * receiver window is way too small.  This could be because recvspace
>   is small, or because the reader doesn't take data out until there's
>   enough, and 'enough' is large relative to recvspace.

There is another issue in this area I've seen. There is no way for the=20
socket layer to tell the tcp layer that it has taken data out of the=20
buffer. So if we get into strange traffic patterns like this, we can end=20
up with a closed or limited window on the sender side even though the=20
window (on the receiver side) is open. Well the crux is that it'd be open=
=20
if the receive side checked it out.

This turns into an issue at high speed, and especially for things like=20
iSCSI which are very bursty and sensitive to latency.

I don't have a solution, but we seem to be having a bit of a "things that=
=20
can go wrong on loopback" session. :-)

> * sender is not respecting the window.  This seems to be the real
>   issue.

Yep, and it doesn't make sense to me.

> Have you increased your buffer sizes?  I'm running amanda locally
> localhost, but I'm using the real IP address (so kerberos works).  I
> have with the following settings:
>=20
> net.inet.tcp.sendspace =3D 131072
> net.inet.tcp.recvspace =3D 131072

I've seen a related issue on loopback. I have an app that sends a lot of=20
unanswered data over loopback as part of one test (the data happen to be=20
iSCSI NOP-OUT packets that don't generate a response). The code was using=
=20
64k socket sizes, and got DEAD slow.

In digging around, I think the problem is with the following in=20
tcp_output.c:tcp_segsize():

        /*     =20
         * Never send more than half a buffer full.  This insures that we c=
an
         * always keep 2 packets on the wire, no matter what SO_SNDBUF is, =
and=20
         * therefore acks will never be delayed unless we run out of data to
         * transmit.
         */    =20
        if (so)
                *txsegsizep =3D min(so->so_snd.sb_hiwat >> 1, *txsegsizep);

I think that should be:

		*txsegsizep =3D min((so->so_snd.sb_hiwat - so->so_snd.sb_lowat)
				>> 1, *txsegsizep);

As it's the difference between high and low water marks that throttle i/o.=
=20
I haven't made this change in the stack as I'm not sure if it's correct.=20
Also, I think sb_lowat is often 1, so it doesn't always matter.

Also, the code now uses 256k buffers, so my immediate pain is gone.

Take care,

Bill

--ylS2wUBXLOxYXZFQ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFFAHZMWz+3JHUci9cRAmeyAJ9mqxCF1olNaVcCd303AD1F2M/QUACeKhtb
MOSmxQsErao3/gGtEZo7PvE=
=posE
-----END PGP SIGNATURE-----

--ylS2wUBXLOxYXZFQ--