Subject: Re: very slow TCP connection on localhost
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-net
Date: 09/07/2006 09:30:05
--=-=-=
Content-Transfer-Encoding: quoted-printable


I recommend using xplot to look at this.  Build
pkgsrc/graphics/xplot-devel and take your tcpdump file and run

tcpdump -r FILE -S -tt tcp | tcpdump2xplot

and you'll get xplot files in the current directory:

=2Drw-r--r--  1 gdt  ir  175859 Sep  7 08:09 localhost.65287-localhost.6528=
8.xplot
=2Drw-r--r--  1 gdt  ir  155744 Sep  7 08:09 localhost.65288-localhost.6528=
7.xplot

read /usr/pkg/share/doc/xplot/README and then run xplot on them.

After a timeout, which I suspect is RTO rather than persist, it sends
a single 32k packet into a 16k window.  This is acked, and the window
increases by 32k to again be 16k open.  Then two more 32k packet are
sent, and one is acked, but the window is not opened.  After 8 ms the
sender sends another 32k packet.  The delay is perhaps caussed by data
arriving at the sender with a cwnd that's open enough for it.  This
provokes a duplicate ack (in only 8us), followed by the window opening
16k (300+ us later).  The sender is then in timeout again.

So I see several problems:

* receiver window is way too small.  This could be because recvspace
  is small, or because the reader doesn't take data out until there's
  enough, and 'enough' is large relative to recvspace.

* sender is not respecting the window.  This seems to be the real
  issue.


Have you increased your buffer sizes?  I'm running amanda locally
localhost, but I'm using the real IP address (so kerberos works).  I
have with the following settings:

net.inet.tcp.sendspace =3D 131072
net.inet.tcp.recvspace =3D 131072

Is this dumper (dump|gzip) to holding disk, or direct to tape?

In my tests with tcpblast (open to discard, write 1k blocks, exit),
the window never got full, and I got 1.6 Gb/s data throughput.

Are you using SACK?

Is the socket somehow locked while tcp is looking at it?  MP system?

Here's some code that seems relevant in tcp_output:


	off =3D tp->snd_nxt - tp->snd_una;

so this will be 64k, since we've sent 2 packets that haven't been acked.

	if (!TCP_SACK_ENABLED(tp)) {
		if (win < so->so_snd.sb_cc) {
			len =3D win - off;
			flags &=3D ~TH_FIN;
		} else
			len =3D so->so_snd.sb_cc - off;

win is 16k, sb_cc is large, so len might go negative.  Apparently this
is handled a bit farther down where a persist timer is set.  But the
real issue is sending 32k into a 16k window, and more seriously
sending a second 32k packet shortly thereafter.  I say more serious
because at least the first too-large packet has some in-window data
and should cause a moved ack.

So you might try to add debug code late in tcp_output that checks if
the data packet being sent goes beyond the window.

=2D-=20
    Greg Troxel <gdt@ir.bbn.com>

--=-=-=
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (NetBSD)

iD8DBQFFAB7l+vesoDJhHiURApoPAKCKdTxb18niJqK+iWcb3n+G1BxGywCffyLp
7Zdu3plXDgeldEWlWeM9KFM=
=f7D8
-----END PGP SIGNATURE-----
--=-=-=--