Subject: Re: Interrupt, interrupt threads, continuations, and kernel lwps
To: Andrew Doran <ad@netbsd.org>
From: None <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 02/22/2007 09:13:36
In message <20070222161653.GA29516@hairylemon.org>Andrew Doran writes
>Hi Jonathan,
>
>On Wed, Feb 21, 2007 at 05:57:41PM -0800, jonathan@dsg.stanford.edu wrote:
>
>> So I assume I'm missing something; can you clue me in?
>
>Yes - you're describing roughly how packet processing works now. 
>My changes make this neither worse nor better.

Hi Andrew,

I truly don't know what to say to that. 
Here's my example case again:

        (...new IPsec'd packet comes in, asserts NIC interrupt)
1. switch from user to NIC interrupt thread
        (...  NIC calls ether_input which demuxes packet, enqueues on
              protocol input routine, requests softint processing, blocks)
2. switch from NIC hardware thread to softint thread
        (... OCF submits job, blocks ...)
3. switch from softint to user  
        (... crypto hardware finishes, requests interrupt...)
4.  switch from user to crypto-interrupt thread
        (... crypto driver calls OCF which wakes up softint processing...)
5. switch to softint thread, process cleartext packet
        (... done with local  kernel packet processing, softint thread  ...)
6. switch back to user. 

And here's an equivalent monolithic-kernel+biglock scenario,
recognizable for 4.3BSD(-Tahoe) to NetBSD-3:

        (...new IPsec'd packet comes in, asserts NIC interrupt)
1. kernel takes interrupt, calls into NIC device interrupt handler
   in currently active context.  Note no context switch.  [1]
        (...  NIC calls ether_input which demuxes packet, enqueues on
              protocol input routine, requests softint processing, returns)

2. After return from hardware interrupt handler, but before returning
   to the pre-interrupt state, the kernel checks for pending software
   interrupts.  Here, we run softints (assuming they weren't active
   at the time we took the interrupt).
   
        (... IP calls to FAST_IPSEC, to OCF, OCF submits job, returns ...)

3. continue returning from   softint to user.  Note no context switch.
        (... crypto hardware finishes, requests interrupt...)

4.  Kernel takes hardware interrupt. Note, no context switch.
    (... crypto driver calls OCF,  which calls FAST_IPSEC 
    continuation, which requests further softint processing via
    schednetisr() ...)

5.  On return from hardware interrupt, the kernel checks for
    pending softints. If softint processing was not already active,
    the kernel does   software-interrupt callouts.

6.  Continue returning from interrupt back to the pre-interrupt
    user code.

The first scenario has several context switches. (It also has hardware
interrupt traps, which we have to take and turn into scheduler events
to wake up the corresponding thread; plus returns from those traps).

The second scenario doesn't have context switches; interrupt
processing is done in the context of the interrupt traps.
That's the difference, and on most CPU architectures, it's a
significant difference.

Compare the TCP rates (multiple ttcp sessions on multiple NICs) that
uniprocessor NetBSD-3.1 can sustain, to what FreeBSD-6 can sustain on
the same hardware --- even *with* SMP and multiple CPUs, and even with
handling the netisr processing in the context of the
hardware-interrupt thread (as Jason mentioned earlier).

And just personally, I still don't think NIC interrupt mitigation is
an on-point response.  First, crypto cards don't support significant
interrupt mitigation (the bcm5823/586x has a work queue that's 4 deep;
the hardware can handle multiple packets in each of those slots, but
our driver doesn't support that).

If you want to discuss interrupt mitigation, then let's consider a
non-IPSEC case with 350,000 packets/sec at 25,000 interrupts/sec.
(That's an good approximation for filling four 1GbE links,
inbound-only, with large sends).


[1] Or on a special interrupt stack, on VAX or other  machines
    which have them; let's leave dedicated interrupt-modes for now.