Subject: Re: Interrupt, interrupt threads, continuations, and kernel lwps
To: None <tech-kern@netbsd.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 06/14/2007 14:11:17
On Sat, May 05, 2007 at 02:16:17PM +0100, Andrew Doran wrote:
> On Wed, Feb 21, 2007 at 10:09:00AM +0000, Andrew Doran wrote:
> 
> > On Wed, Feb 21, 2007 at 12:08:36AM -0800, Matt Thomas wrote:
> 
> > > I think that hard interrupts should simply invoke the handler (at the  
> > > appropriate IPL), the handler will disable the cause of the  
> > > interrupt, optionally it may use a SPIN mutex to gain control over a  
> > > shared section, do something to the device, release the mutex, or it  
> > > may just schedule a continuation to run (either via software  
> > > interrupt or via a kernel lwp workqueue, the former can't sleep/ 
> > > block, the latter can).
> >
> > To reiterate, there are two reasons I want to use LWPs to handle interrupts:
> > signficantly cheaper locking primitives on MP systems, and the ability to
> > eliminate the nasty deadlocks associated with interrupts/MP and interrupt
> > priority levels. The intent is *not* to rely heavily on blocking as the main
> > synchronization mechanism between the top and bottom halfs.
> 
> So I've given this more thought, and I now think that a hybrid approach is
> the way to go. I think that minimizing the level of change to interrupt
> handling on the various platforms is important, since it's particularly
> tricky. Here's my updated proposal:
> 
> => hardware interrupts
> 
> Hardware interrupts would function as Matt describes, but with work still
> being handed down via soft interrupt. We would need to reduce the amount of
> work done in interrupt handlers. For example, calls to biodone() from
> interrupt handlers would be replaced by biointr(), and a soft interrupt
> handler would call biodone().
> 
> => software interrupts
> 
> Where it's possible, software interrupts would work as I described before.
> They borrow the interrupted thread's VM context, and are able to block.
> Where the machine is modal, or for bringup, or where there a lack of time
> or interest, software interrupts can be implemented in an MI way using
> kthreads.
> 
> Soft interrupt handlers would be per-CPU. If a soft interrupt is triggered
> on a CPU, it must occur on that CPU. On x86 at least, it is currently
> possible for another CPU to snarf it and clear the pending status. In the
> long run, we may want the ability to direct soft interrupts to other CPUs,
> but only if the driver asks for it. Each LWP dedicated to handling a soft
> interrupt would be bound to it's home CPU, so if it blocks and needs to run
> again, it would only run there.
> 
> The per-CPU requirement means it would be possible to hand work down to the
> soft interrupt handler without using locks.
> 
> Software interrupts would not be able to:
> 
> - sleep using condition variables
> - use lockmgr()
> - wait for memory to become available (eg: KM_SLEEP, PR_WAITOK, ...)
> 
> => primitives
> 
> Document mutex_spin_enter/mutex_spin_exit for device drivers, which avoids
> a costly trip through mutex_enter/mutex_exit for spinlocks. All interrupt
> levels become able to use mutexes. It's not possible now for serial
> interrupts or IPIs on x86. So what is called IPL_LOCK is replaced by
> IPL_HIGH. That's mostly used by the lockdebug and lockstat code.
> 
> => spl hierarchy and facilities
> 
> The soft interrupt levels would cease to exist - at least in the long run.
> There are a few places we may still want the ability to block softnet until
> we can fix the concurrency issues.
> 
> I propose that we then flatten the hierarchy to look like this:
> 
> o IPL_NONE
> 
>   Usual state of the system, no interrupts blocked.
> 
> o IPL_LOW
> 
>   Blocks all "low priority" hardware interrupts. Mostly equivalent to
>   splvm/splimp, but with the additional guarantee that it will block
>   anything that can take the kernel lock. By its nature, blocks soft
>   interrupts from occurring.
> 
>   What interrupts at this level can do is restricted further. It would not
>   be possible for them to send signals to processes or inspect any process
>   state. That all needs to be deferred to a software interrupt. It would be
>   possible to wake LWPs using cv_broadcast()/cv_signal().
> 
>   The VM system would run at this level, so it's still possible to
>   allocate/free memory. Longer term I think it may be worthwhile restricting
>   interrupt handlers' view of the VM system to eg: pool_get, pool_put.
> 
> o IPL_MID
> 
>   Blocks mid level interrupts, like the clock or (for example) audio
>   interrupts, and also blocks everything at IPL_LOW. Similar to what
>   IPL_SCHED does now.
> 
>   Handlers at this level would have essentially the same capabilities as
>   IPL_LOW, but would not be able to make use of the VM system, and would not
>   be able to take the kernel lock. The scheduler would run at this level.
> 
> o IPL_HIGH
> 
>   Blocks all high level interrupts, like: statclock, IPIs (x86), serial. 
>   Also blocks everything at lower levels.
> 
>   Handlers at this level would be even further restricted in what they can
>   do. The synchronization mechanisms available to them would be: scheduling
>   a soft interrupt, using spin mutexes, and using the spl calls. They could
>   not call e.g. cv_broadcast(), or acquire the kernel lock. By extension, it
>   would not be possible for LWPs to sleep at IPL_HIGH.

I plan to implement this over the next couple of months. Some of the changes
involved:

o Add a cpu_intr_p() that returns true if currently handling a hardware
  interrupt. This would be used so (for example) biodone knows whether
  or not to defer processing to a soft interrupt.

o Increase the number of available priority levels to 256 as discussed
  earlier. The priority space is expanded to include real time and
  (soft) interrupt threads.

o Pull in the less invasive changes from the vmlocking branch: those that
  do not touch the vm or vfs system. A couple of these are: the ability
  to create bound kthreads, and kthreads running as lwps in proc0.

o Flatten the spl hierarchy as described above.

For x86 I'll implement the fast path where soft interrupts run on top of the
currently executing thread, borrowing its VM context. I have a generic
implementation for sys/kern, which has some disadvantages when compared
to the "fast path":

o It uses the dispatcher and thus increases overhead. It will however
  be cheaper than using a kthread and mutex/cv pair to dispatch work:
  the threads transition in and out of LSIDL and are CPU local so the
  need for synchronization is reduced.
o It requires the currently running thread to leave the CPU before the 
  sofware interrupt can begin executing. This will introduce latency.
o It increases the penalty for taking a software interrupt, beause it 
  means a context switch for each interrupt taken (but not for each
  soft interrupt triggered, if the system is busy there's not a 1:1
  correlation).
o It requires splsched == splhigh (or going forward, splmid == splhigh).

The advantage to using the generic mechanism is:

o It works everywhere.

To transition I want to introduce a new soft interrupt interface, with the
methods prefixed with 'softint_'. It will look just about the same as the
old, but with some extra parameters. This is so that the existing MD soft
interrupt code can continue to exist for some months, allowing other
architectures to be more easily modified to use the fast implementation like
x86. After the dust has settled I plan to remove the existing soft interrupt
interface, where it still exists.

Andrew