Subject: Re: Networking speed
To: Daniel Carosone <dan@geek.com.au>
From: Mark White <mjw@celos.net>
List: tech-net
Date: 10/01/2004 16:02:10
Daniel Carosone writes:
> On Thu, Sep 30, 2004 at 11:36:31PM +0100, Mark White wrote:
> > TSG recorded on NetBSD for MacOS->NetBSD transfers shows
> > approx 10ms bursts of traffic (a few dozen segments),
> > separated by gaps of a second or two.  Typical structure: at
> > the end of a burst I see segments 1-2-6-7, followed by a
> > second or so gap with no ACKs, then 3-4-5 out of order, and
> > resends of 6-7.  Then a few ms of good traffic, and repeat.
> 
> This is definately some kind of packet loss; the out-of-order packets
> are ones that were not received the first time.  

Ok, that makes sense.  Thanks for the detailed reply; I'm
learning quite a bit about TCP here, too... :-)

> The pauses are Nagle's algorithm; the sender doing the congestion
> avoidance backoff referred to previously. You may also notice that the
> first few segments after a restart are more widely spaced.

Yep, I can see the slower restart as well.

> It's hard to say exactly without looking at it, but the lack of *any*
> ack's is a little confusing there. You should see at least one more
> re-ack when the out-of-order segment comes in. (just a little green
> tick on the line)

Hmm, you're right; on closer inspection all cases have a few
repeat ACK ticks after the first lost packet.  Must have not
zoomed time in enough earlier.  Sorry about that.

I can put some captures on the web if you want to look...

> Does setting net.inet.tcp.delack_ticks=0 on netbsd help at all?

Sadly not.  This would be decreasing the delay before ACKing?

> If you run the capture on OS/X, you should see them being
> sent. [...] That is more like what i'd expect to see.
> Comparing plots from captures taken at sender and receiver
> for the same session will be interesting.

Ok -- the packets are indeed being sent in order by OSX, and
going missing before NetBSD sees them.  OSX sees the
repeated ACKs, and apparently responds correctly by backing
off and resending.

> You may possibly have a duplex issue, mac's do something slightly odd
> with respect to Nway (Apple's wishful thinking of how the standard
> should have been written, rather than how it was).  It would be worth
> confirming that all ports are autonegotiating 100/full, both as
> reported by the box and the switch, just in case.

I can't see any problems: both machines report 100baseTX
full-duplex, and the switch FDX lights are lit.

> But more likely you're just seeing the ste fail to handle
> a fast chain of packets and dropping some; it's a well
> known problem, at least for some quad cards that put four
> of them behind a ppb.  Is it hard to try another NIC?

Easy enough.  I've just done more experiments with a vr
(also D-Link card, but different chipset so probably ok).
Basically the same: transfer rates very low, and the TSG
shows short bursts separated by some missing packets, a 1-2s
wait, and some resends.  Sometimes it's ok to start with.

Slight differences: with the vr it sometimes gets stuck
*completely*; connection stalls, trace shows missing packet,
nothing else comes in.  Sometimes the link needs sending
down & up before it starts listening again at all.

> Another experiment to try: set the ste to 10 rather than 100. You may
> actually get better throughput by avoiding nagle and slow-start.

Yes, this helps: I get about 1MB/s which it all you'd expect
from a 10T connection.  The trace shows the packets arriving
in order, fairly evenly.  Setting the sending end to 10
instead has the same effect.

When replugging/ifconfiging, it occasionally starts behaving
again for a bit.  Last time this happened I did a capture,
and discovered that it dropped from 11MB/s to 50kB/s *while
running tcpdump* (with transmission pauses visible on
traffic LEDs and in the trace) then was ok again afterwards.
Odd.  I've tried both CPU and disk loading during transfer,
but neither produces the same effect.

Same thing with different OSX machines sending.  The other
NetBSD machine I have here (which also has an ste) doesn't
have any problems as sender or receiver, btw; it's old,
though, and doesn't talk at full network speed anyway.

Thanks -- sorry for the long reply,
Mark <><