NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Network performance issues



On Thu, 26 Aug 2010 09:13:57 -0400
Steven Bellovin <smb%cs.columbia.edu@localhost> wrote:

> 
> On Aug 26, 2010, at 8:59 33AM, Sad Clouds wrote:
> 
> > Hi, I'm testing some networking code I've written that uses I/O
> > multiplexing with kqueue and poll. The test consists of the
> > following:
> > 
> > Client opens 100 simultaneous connections and sends N bytes of data
> > on each connection.
> > 
> > Server accepts connections and replies N bytes of data back to the
> > client.
> > 
> > When the client's send and receive data counters reach N bytes for a
> > connection, it closes this connection.
> > 
> > I'm seeing some performance issues with larger data segments:
> > 
> > Time for client to open 100 connections, send and receive 500 bytes
> > on each connection ranges from 0.07 to 1.54 seconds.
> > 
> > Time for client to open 100 connections, send and receive 1000
> > bytes on each connection gives 31.05 seconds.
> > 
> > The hardware is two fast x86 machines, both using hme0 network
> > interfaces. The machines are connected to 100Mbps Ethernet switch.
> > Client is running NetBSD 5.0.2, server is running NetBSD 5.1_RC3.
> > 
> > I've gone through my code and can't see any problems. The hardware
> > has plenty of bandwidth, so that shouldn't slow things down. What I
> > find strange is that going from 500 byte data segments to 1000 byte,
> > increases total time wait by so much. If I had bugs in my code,
> > surely that would result in identical issues with 500 and 1000 byte
> > data segments. I tested this with kqueue and poll and got similar
> > results.
> > 
> > Can this be a kernel issue? What sysctl tunable parameters could
> > have influence on this?
> 
> I suspect that the underlying problem is packet loss.  If a segment
> gets dropped and retransmitted, the sending TCP will slow down.  But
> that's connection-specific; other connections won't be affected.  You
> can verify that that's happening by running tcpdump on both ends,
> though I confess I don't know of a good analysis tool.  The question
> is why packets are getting lost.
> 
> Several possibilities occur to me.  First, and most likely: you're
> exceeding the buffer capacity of the switch.  I don't think the
> problem is on the sending side, since as I recall the queue length
> limit on NetBSD is in terms of packets, not bytes, but the switch may
> be different.  Alternatively, it may be the receiving NIC -- it may
> have buffer management problems, too.
> 
> Second: perhaps there's a cable problem -- I've often seen long
> packets fail under such circumstances.  Try using ping at various
> packet sizes and see if there's greater packet loss.
> 

Hi Steven, I've tracked down the issue to Sun quad port hme PCI network
card I've been using. I'm using a similar, but single port hme card on
the other test machine, and it doesn't appear to have any problems.

It is only this quad port hme card that is giving erratic results. I've
switched to a cheap Tulip (tlp0) card and the problems have disappeared.

So it's either my quad hme card is faulty, or NetBSD has buggy hme
driver. Now I remember I was having similar issues on sparc64 machine
with the same card. I think about 2 years ago I posted about
miserable NFS performance.

When I have time, I'll boot into Solaris, if Solaris is having the same
problems with this card, then it's definitely hardware issue.


Home | Main Index | Thread Index | Old Index