tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD 5.1 TCP performance issue (lots of ACK)



Manuel Bouyer <bouyer%antioche.eu.org@localhost> writes:

> On Wed, Oct 26, 2011 at 08:15:44PM -0400, Greg Troxel wrote:

> Yes, between 40 and 50MB/s

ok, that matches what I see in the trace.

>> What is between these two devices?  Is this just a gigabit switch, or
>> anything more complicated?
>
> they're all (the 2 NetBSD and the linux host)  connected to a cisco 3750
> gigabit switch. I also tested with a single crossover cable, this doens't
> change anything .

OK - I've just seen enough things that are supposed to be transparant
and aren't.

> that's easy. And yes, I get better performances: 77MB/s instead of < 50.

And does gluster then match ttcp, as in both 77?

> So it looks like we have something wrong with TSO.
> The traces are still at ftp://asim.lip6.fr/outgoing/bouyer/
> (netbsd-{client,server}-notso.pcap.gz).
>
> Did you see the reordering in the ttcp trace too ?

There were some, but it seems not big enough to cause real problems.  As
long as TCP does fast recovery and doesn't go into timeout, things work
ok enough that it's really hard to notice.

> But, that still doesn't explain why I get good performances when one
> of the host is linux. NetBSD used tso as well, and it didn't seem to cause
> problems for linux ...

Sure, but TCP performance is subtle and there are all sorts of ways
things can line up to provoke or not provoke latent bugs.  It seems
likely that whatever bad behavior the tso option is causing is either
doesn't bother the linux receiver in terms of the acks it sends, or the
congestion window doesn't get big enough to trigger the tso bugs, or
something else like that.  You can't conclude much from linux/netbsd
working well other than that things are mostly ok.

> BTW, how is TSO working ? does the adapter get a single data block of
> a full window size ? if so, maybe the transmit ring just isn't big
> enough ...

I have no idea.  Also, is there receive offload?  The receiver has
packets arriving all together whereas they are showing up more spread
out at the transmitter.  It may be that reordering happens in the
controller, or it may be that it happens at the receiver when the
packets are regenerated from the large buffer (and then injected out of
order).

One thing to keep in mind is that the tcpdump timestamps are not when
the packet arrives on the wire.  They are the system time when the bpf
call is made, which is in many drivers when the packet's pointers are
loaded into the transmit ring.

>> thrashing.  What happens if you change gluster to have smaller buffers

I would do this experiment; that may avoid the problem.  I'm not
suggesting that you run this way forever, but it will help us understand
what's wrong.

>> (I don't understand why it's ok to have the FS change the tcp socket
>> buffer options from system default)?
>
> Because it knows the size of its packets, or its internal receive buffers ?

This is TCP, so gluster can have a large buffer in user space
independently of what the TCP socket buffer is.  People set TCP socket
buffers to control the advertised window and to balance throughput on
long fat pipes with memory usage.  In your case the RTT is only a few ms
even under load, so it wouldn't seem that huge buffers are necessary.

Do you have actual problems if gluster doesn't force the buffer to be
large?

(That said, having buffers large enough to allow streaming is generally
good.  But if you need that, it's not really about one user of TCP.   I
have been turning on 

net.inet.tcp.recvbuf_auto = 1
net.inet.tcp.sendbuf_auto = 1
net.inet6.tcp6.recvbuf_auto = 1
net.inet6.tcp6.sendbuf_auto = 1

to let buffers get bigger when TCP would be blocked by socket buffer.
In 5.1, that seems to lead to running out of mbuf clusters rather than
reclaiming them (when there are lots of connections), but I'm hoping
this is better in -current (or rather deferring looking into it until I
jump to current).

If you can get ttcp to show the same performance problems (by setting
buffer sizes, perhaps), then we can debug this without gluster, which
would help.

Also, it would be nice to have a third machine on the switch and run
tcpdump (without any funky offload behavior) and see what the packets on
the wire really look like.  With the tso behavior I am not confident
that either trace is exactly what's on the wire.

Have you seen: http://gnats.netbsd.org/42323

Attachment: pgpqxkZmw2mUP.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index