Subject: Re: Interrupts as threads
To: None <tech-kern@netbsd.org, tech-smp@netbsd.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 12/03/2006 22:36:30
On Sat, Dec 02, 2006 at 10:25:42AM +0000, David Laight wrote:

> On Fri, Dec 01, 2006 at 11:31:19PM +0000, Andrew Doran wrote:
> > 
> > o There is no easy solution to the lock order problem with the kernel_lock
> >   when using spin locks.
> 
> My brief peruse at the biglock code did make it clear how it worked when
> an IRQ came in when the 2nd cpu had the biglock...

The problem is that at the moment, acquiring any given spin lock is a
deadlock waiting to happen unless you are at ~IPL_SCHED or above when
acquiring it, or are certain that it will only ever be acquired with the
kernel lock unheld. Ensuring that the kernel lock is always held when you
acquire the spin lock means that the spin lock is useless :-).

It would not be a major undertaking to push that down to IPL_VM by making
other interrupt handlers MP safe, the bulk of which are audio drivers I
think. Once you move beyond those drivers, things start to get more
difficult and I'm not convinced that we have the resources to tackle the
issue that way.

> > o Using spin locks we will have to keep the SPL above IPL_NONE for longer
> >   that before, or accept (in non-trivial cases) the undesirable cost of
> >   having both interrupt and process context locks around some objects.
> 
> SMP code does tend to need mutex protection for longer than the non-SMP
> code required IPL protection anyway...

Yup.
 
> > o Raising and lowering the SPL is expensive, especially on machines that
> >   need to talk with the hardware on SPL operation. The spin lock path also
> >   has more test+branch pairs / conditional moves and memory references
> >   involved than process locks. For a process context lock, the minimum we
> >   can get away with on entry and exit is one test+branch and two cache line
> >   references.
> 
> Can we do deferred SPL changes on all archs?
> ie don't frob the hardware, but assume it won't raise an IRQ. If an IRQ
> does occur, mask it then and return, re-enabling of the splx call.

We probably could, but it's still expensive to acquire spin locks that
modify the SPL. Have a look at arch/i386/i386/lock_stubs.S on the newlock2
branch and compare the paths for spin vs. adaptive mutexes. (I know that at
least _lock_set_waiters() is broken. Also, I haven't optimised x86 assembly
in a long time - please don't laugh. ;-)

> > o Every spin lock / unlock pair denotes a critical section where threads
> >   running in the kernel can not be preempted. That's not currently an issue
> >   but if we move to support real time threads it could become one; I'm not
> >   sure.
> 
> Once we have a proper SMP kernel, making processes pre-emptable in the kernel
> (while not holding a lock) becomes ~free - whereas the non-SMP kernel code
> will make assumptions that stop pre-empting.
> However you probably want to be able to disable pre-emption whithout
> holding a mutex.

Yup, agreed.
 
> > The cleanest way to deal with these issues that I can see is to use
> > lightweight threads to handle interrupts
> 
> That just makes it worse! Every hardware ISR would have to disable the
> IRQ itself, then the 'low level' ISR would need to re-enable it.

I don't see why we would have to do that. The way I envisoned this working,
for a typical interrupt path that doesn't have to block on any locks, the
prologue and epliogue for the interrupt itself (not the individual ISRs)
would have a few additional steps. Something like:

entry:		pick an idle interrupt LWP from curcpu
		note which LWP we have preempted
		set curlwp
		switch onto new stack

exit:		restore curlwp
		switch back to old stack

If one of the ISRs needs to block, then we would have to switch back to the
LWP that was preempted and hold the SPL at the ISR's base level until the
ISR receives the lock. So in effect, things like splx() would become a
request to alter the SPL, but wouldn't be able to drop it below the base
level for the CPU (which would typically be IPL_NONE).
 
> > My initial thought is to have one
> > thread per level, per CPU. These would be able to preempt already running
> > threads, and would hold preempted threads in-situ until the interrupt thread
> > returns or switches away. In most cases, SPL operations would be replaced by
> > locks.
> 
> You'd have to look very closely at priority inversion problems....

True.. The turnstile mechanism would handle this for locks, but that's
currently implemented.
  
> > Blocking would no longer be prohibited, but strongly discouraged - so
> > doing something like pool_get(foo, PR_WAITOK) should likely trigger an
> > assertion.
> 
> If you allow blocking, then people will use it because it 'appears to work'
> then you find that one of your ISR threads is busy

I considered that but I feel that it's really a documentation and code
review issue. There are plenty of places we can put assertions though. For
instance around memory allocation, or direct use of the tsleep() / cv_wait()
interfaces. In any case there are plenty of inadvisable things that you can
do from an interrupt handler already, and we have drivers that already do (I
wrote one of them :-).

> > Assuming you subscribe to handling interrupts with threads, it raises the
> > question: where to draw the line between threaded and 'traditional'. It
> > certianly makes sense to run soft interrupts this way, and I would draw the
> > line at higher priority ISRs like network, audio, serial and clock.
> 
> It may make sense to use kernel threads (running through the scheduler)
> for some driver activity, and quite probably some of the code scheduled
> via 'softint' is in this category.

Sure, but there is a cutoff point: that introduces the full overhead of the
scheduler in addition to taking the interrupt. If the softint handler is an
interrupt thread, then that doesn't happen unless absolutely necessary.

Thanks,
Andrew