Subject: Re: NetBSD and large pps
To: Mihai CHELARU <kefren@netbastards.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-net
Date: 12/03/2004 10:03:50
On Fri, Dec 03, 2004 at 11:02:22AM +0200, Mihai CHELARU wrote:
> 
> Software tweaks:
> 	- HZ 1000
> 
> So, I have 4000 IRQs/sec generated by scheduler. Rest of IRQs/sec is 

I'm a little confused by this.  If you've set HZ=1000 (which is a very
bad thing to set it to; the table for doing quick time computations based
on HZ has an entry for 1024, but not for 1000), why are you getting *4000*
interrupts per second?

> NAPI means RX polling, meaning that not for every packet received there 
> is an IRQ generated. Don't know more, this is what I understood from 

This is a perfect example of why it's a really bad idea to use marketing
terms in technical discussion.  By invoking the amorphous "NAPI", you
have confused two different techniques, interrupt pacing (a.k.a. interrupt
"coalescing") and strict polling.  Both trade latency for throughput; the
latter requires moderately painful support in the kernel but will work with
any network card, the former requires hardware support in the interface
card but little support in the kernel.

Interrupt pacing or coalescing just means that the card buffers packets
internally and only generates one interrupt every N packets, usually with
a timer so that it generates an interrupt at least every N microseconds
(this puts an upper bound on latency).  The wm and bge hardware supports
this (for that matter, so do tlp and lots of other older cards) but the
tricky thing is setting the interrupt threshold and maximum-latency timer
correctly for your application.  Jonathan has just given you suggestions
on how to set these thresholds better for what you're doing.

Polling just ignores network interrupts completely, and enforces a strict
latency/throughput trade-off by reading from the network device according
to a timer.  This avoids interrupt-service overhead, at the expense of
significant software complexity and of always making the _worst-case_
latency decision, rather than treating the increased latency as an upper
bound.

Basically, if we knew how to set the coalescing thresholds and timers
automatically, and we could get our interrupt code efficient enough,
the first approach would always win, given cards that support it.  But
we don't, and one advantage of polling (which, mind you, a stock NetBSD
kernel can't do) is that at least there's only one value to adjust, the
timer value according to which we read from the card.

I still don't know what "NAPI" is, but hopefully this will help you
understand the actual technical issues at work here.