Subject: Re: Interrupt, interrupt threads, continuations, and kernel lwps
To: None <tech-kern@netbsd.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 05/05/2007 14:16:17
On Wed, Feb 21, 2007 at 10:09:00AM +0000, Andrew Doran wrote:

> On Wed, Feb 21, 2007 at 12:08:36AM -0800, Matt Thomas wrote:

> > I think that hard interrupts should simply invoke the handler (at the  
> > appropriate IPL), the handler will disable the cause of the  
> > interrupt, optionally it may use a SPIN mutex to gain control over a  
> > shared section, do something to the device, release the mutex, or it  
> > may just schedule a continuation to run (either via software  
> > interrupt or via a kernel lwp workqueue, the former can't sleep/ 
> > block, the latter can).
>
> To reiterate, there are two reasons I want to use LWPs to handle interrupts:
> signficantly cheaper locking primitives on MP systems, and the ability to
> eliminate the nasty deadlocks associated with interrupts/MP and interrupt
> priority levels. The intent is *not* to rely heavily on blocking as the main
> synchronization mechanism between the top and bottom halfs.

So I've given this more thought, and I now think that a hybrid approach is
the way to go. I think that minimizing the level of change to interrupt
handling on the various platforms is important, since it's particularly
tricky. Here's my updated proposal:

=> hardware interrupts

Hardware interrupts would function as Matt describes, but with work still
being handed down via soft interrupt. We would need to reduce the amount of
work done in interrupt handlers. For example, calls to biodone() from
interrupt handlers would be replaced by biointr(), and a soft interrupt
handler would call biodone().

=> software interrupts

Where it's possible, software interrupts would work as I described before.
They borrow the interrupted thread's VM context, and are able to block.
Where the machine is modal, or for bringup, or where there a lack of time
or interest, software interrupts can be implemented in an MI way using
kthreads.

Soft interrupt handlers would be per-CPU. If a soft interrupt is triggered
on a CPU, it must occur on that CPU. On x86 at least, it is currently
possible for another CPU to snarf it and clear the pending status. In the
long run, we may want the ability to direct soft interrupts to other CPUs,
but only if the driver asks for it. Each LWP dedicated to handling a soft
interrupt would be bound to it's home CPU, so if it blocks and needs to run
again, it would only run there.

The per-CPU requirement means it would be possible to hand work down to the
soft interrupt handler without using locks.

Software interrupts would not be able to:

- sleep using condition variables
- use lockmgr()
- wait for memory to become available (eg: KM_SLEEP, PR_WAITOK, ...)

=> primitives

Document mutex_spin_enter/mutex_spin_exit for device drivers, which avoids
a costly trip through mutex_enter/mutex_exit for spinlocks. All interrupt
levels become able to use mutexes. It's not possible now for serial
interrupts or IPIs on x86. So what is called IPL_LOCK is replaced by
IPL_HIGH. That's mostly used by the lockdebug and lockstat code.

=> spl hierarchy and facilities

The soft interrupt levels would cease to exist - at least in the long run.
There are a few places we may still want the ability to block softnet until
we can fix the concurrency issues.

I propose that we then flatten the hierarchy to look like this:

o IPL_NONE

  Usual state of the system, no interrupts blocked.

o IPL_LOW

  Blocks all "low priority" hardware interrupts. Mostly equivalent to
  splvm/splimp, but with the additional guarantee that it will block
  anything that can take the kernel lock. By its nature, blocks soft
  interrupts from occurring.

  What interrupts at this level can do is restricted further. It would not
  be possible for them to send signals to processes or inspect any process
  state. That all needs to be deferred to a software interrupt. It would be
  possible to wake LWPs using cv_broadcast()/cv_signal().

  The VM system would run at this level, so it's still possible to
  allocate/free memory. Longer term I think it may be worthwhile restricting
  interrupt handlers' view of the VM system to eg: pool_get, pool_put.

o IPL_MID

  Blocks mid level interrupts, like the clock or (for example) audio
  interrupts, and also blocks everything at IPL_LOW. Similar to what
  IPL_SCHED does now.

  Handlers at this level would have essentially the same capabilities as
  IPL_LOW, but would not be able to make use of the VM system, and would not
  be able to take the kernel lock. The scheduler would run at this level.

o IPL_HIGH

  Blocks all high level interrupts, like: statclock, IPIs (x86), serial. 
  Also blocks everything at lower levels.

  Handlers at this level would be even further restricted in what they can
  do. The synchronization mechanisms available to them would be: scheduling
  a soft interrupt, using spin mutexes, and using the spl calls. They could
  not call e.g. cv_broadcast(), or acquire the kernel lock. By extension, it
  would not be possible for LWPs to sleep at IPL_HIGH.

Thoughts?

Thanks,
Andrew