Subject: Re: Interrupt, interrupt threads, continuations, and kernel lwps
To: Bill Studenmund <wrstuden@netbsd.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 02/22/2007 20:54:30
On Thu, Feb 22, 2007 at 09:46:42AM -0800, Bill Studenmund wrote:

> On Thu, Feb 22, 2007 at 09:13:36AM -0800, jonathan@dsg.stanford.edu wrote:
> > 
> > In message <20070222161653.GA29516@hairylemon.org>Andrew Doran writes
> > >Hi Jonathan,
> > >
> > >On Wed, Feb 21, 2007 at 05:57:41PM -0800, jonathan@dsg.stanford.edu wrote:
> > >
> > >My changes make this neither worse nor better.
> > 
> > Hi Andrew,
> > 
> > I truly don't know what to say to that. 
> > Here's my example case again:
> > 
> >         (...new IPsec'd packet comes in, asserts NIC interrupt)
> > 1. switch from user to NIC interrupt thread
> >         (...  NIC calls ether_input which demuxes packet, enqueues on
> >               protocol input routine, requests softint processing, blocks)
> > 2. switch from NIC hardware thread to softint thread
> >         (... OCF submits job, blocks ...)
> > 3. switch from softint to user  
> >         (... crypto hardware finishes, requests interrupt...)
> > 4.  switch from user to crypto-interrupt thread
> >         (... crypto driver calls OCF which wakes up softint processing...)
> > 5. switch to softint thread, process cleartext packet
> >         (... done with local  kernel packet processing, softint thread  ...)
> > 6. switch back to user. 
> > 
> > And here's an equivalent monolithic-kernel+biglock scenario,
> > recognizable for 4.3BSD(-Tahoe) to NetBSD-3:
> > 
> >         (...new IPsec'd packet comes in, asserts NIC interrupt)
> > 1. kernel takes interrupt, calls into NIC device interrupt handler
> >    in currently active context.  Note no context switch.  [1]
> >         (...  NIC calls ether_input which demuxes packet, enqueues on
> >               protocol input routine, requests softint processing, returns)
> > 
> > 2. After return from hardware interrupt handler, but before returning
> >    to the pre-interrupt state, the kernel checks for pending software
> >    interrupts.  Here, we run softints (assuming they weren't active
> >    at the time we took the interrupt).
> >    
> >         (... IP calls to FAST_IPSEC, to OCF, OCF submits job, returns ...)
> > 
> > 3. continue returning from   softint to user.  Note no context switch.
> >         (... crypto hardware finishes, requests interrupt...)
> > 
> > 4.  Kernel takes hardware interrupt. Note, no context switch.
> >     (... crypto driver calls OCF,  which calls FAST_IPSEC 
> >     continuation, which requests further softint processing via
> >     schednetisr() ...)
> > 
> > 5.  On return from hardware interrupt, the kernel checks for
> >     pending softints. If softint processing was not already active,
> >     the kernel does   software-interrupt callouts.
> > 
> > 6.  Continue returning from interrupt back to the pre-interrupt
> >     user code.
> > 
> > The first scenario has several context switches. (It also has hardware
> > interrupt traps, which we have to take and turn into scheduler events
> > to wake up the corresponding thread; plus returns from those traps).
> 
> I think the problem is you've assumed an implementation, and specifically 
> you've assumed one other than what Andy was suggesting.
> 
> My understanding is that Andy has figured out a way to have, at least on 
> x86, the interrupt handler borrow the context of the interrupted thread. 
> So the interrupt context switch is also the context switch to the thread. 
> That's why he said it was the same as what we do now.

Exactly. We already context switch for interrupts, but it is not the same as
mi_switch. What I want to do is give the interrupt handler enough context
(curlwp, stack) that it can block briefly and be restarted later. There are
two outcomes: the interrupt runs to completion, or the handler blocks. In
both of those cases, we return back to the interrupted LWP just as we do
now.

The handlers would be permitted to block only in order to acquire a mutex or
RW lock. Calling cv_wait(), or lockmgr() or pool_get(, PR_WAITOK) etc. from
the handler's context would panic the machine.

When the lock a handler is waiting on is released, we end up in sleepq_wake.
The interrupt handler gets marked runnable and put onto a per-CPU run queue.
Just before returning, sleepq_wake notices that there is a high priority LWP
(above system or kernel priority) waiting to run and calls preempt. The
interrupt handler is the highest priority item in the run queue, so it gets
picked and put back on the CPU.

What Jonathan is describing is roughly how FreeBSD works, I think. When the
interrupt comes in, mi_switch() is called to dispatch it. The thread that
was running when the interrupt came in gets kicked off the CPU.

Andrew