tech-kern: Re: packet capturing

Subject: Re: packet capturing
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 01/23/2004 16:30:36
On Fri, Jan 23, 2004 at 09:33:15PM +0100, Manuel Bouyer wrote:
> On Thu, Jan 22, 2004 at 09:05:32AM -0800, Jason Thorpe wrote:
> > [...]
> > Regarding 4-port 10/100 boards, D-Link may still have the DFE-570TX 
> 
> No, it was remplaced with the DFE-580TX, using Sundance ST-201 chips instead
> of tulip. Theses are crap, the PCI/PCI bridge can't handle the load of
> moderate load on 2 of the 4 interfaces at the same time. The receive buffer
> on chip is only 2k, which means it can't handle more than 1 full ethernet
> frame at once.

It's my impression that there are two general design philosophies for
network interface chips:

1) "Buffer many packets, trickle them out to memory as necessary"

2) "Buffer almost nothing, rely on bursting packets into host memory before
   the buffer fills".  These often even start DMA before they've got the
   entire packet off the wire, or try to anyway.

The thing is, *if the assumptions about bus contention made by the chip's
designers are true*, designs of type 2 can actually perform better than
designs of type 1.  Unfortunately, they're seldom true.  In particular,
with early PCI-PCI bridges, the devices behind them can't burst, so one
of the key assumptions about how fast data can get to host memory is 
false.  Or there may not be the same number of descriptors configured to
DMA to as the designers assumed there would be, when they modelled the
chip's performance, causing stalls for that reason.  Or something that
has a higher latency timer value may suck up the whole bus.  Or... well,
you get the idea.

It seems to me if you're going to put a pile of network interface chips
behind a PCI-PCI bridge (meaning they all share one latency-timer's worth
of bursting on the primary bus, as I understand it, or sometimes can't
burst at all) you'd better make sure they're type 1 ("buffer a lot of
stuff and trickle it out") and that their buffering is set up 
appropriately.  For example, the current crop of Marvell "sk" chips can
be configured for "store and forward" or for "cut-through", even though
they do have a semi-decent amount of buffering on them (128K).  If you
configure them for cut-through, and they're behind a PCI-PCI bridge,
they perform *terribly*.  This isn't a badly-designed chip, but it _is_
a chip whose design requires the driver to set it up differently depending
on some facts about the bus it's on.

You can see something similar with the Intel gig chips if you drop them
onto a bus that's not the one they're tuned for by default.  Put a PCI-X
Intel chip on a 33MHz bus, and note that its performance will be _far_
worse than that of the previous-generation straight-PCI part, even if the
bus is lightly loaded.  I suspect this is one reason the "em" driver does
better than our "wm" driver in some cases: it tries to tune the chip
according to some facts about the PCI bus.  Generally, this is a good
thing to take into account, even though in an ideal world one should not
have to; our drivers don't really do this at all right now.

Thor