tech-kern: Re: Nitty-gritty questions on locking and interrupts

Subject: Re: Nitty-gritty questions on locking and interrupts
To: None <kpneal@pobox.com, tech-kern@netbsd.org>
From: Gary Thorpe <gathorpe79@yahoo.com>
List: tech-kern
Date: 01/01/2003 15:01:47
 --- kpneal@pobox.com wrote: > Ok, so I've got some work I want to do
in the kernel. It involves
> code that lives in the top and bottom halves, and so runs in either
> a process context (right?) or an interrupt context. 
> 
> Now, what does this top/bottom split mean in an SMP world? Different
> processors in the same system can at any point in time have different
> spl
> levels/masks, right? Are the spl calls going to go away?

I would guess (having no real knowledge of the SMP code in NetBSD),
that spl calls will still be needed to prevent lower priority
preemption of an interrupt handler running on a processor. I would also
guess that different processors would have different spl levels/masks.

> 
> What is the correct way to write code that lives on the boundry
> between
> top and bottom but is still SMP-safe?
> 
> The simple_lock man page says this on struct simplelock:
> 
>               Simplelocks are usually used only by the high-level
> lock manager
>               and to protect short, critical sections of code. 
> Simplelocks
>               are the only locks that can be be used inside an
> interrupt han-
>               dler.  For a simplelock to be used in an interrupt
> handler, care
>               must be taken to disable the interrupt, acquire the
> lock, do any
>               processing, release the simplelock and re-enable the
> interrupt.
>               This procedure is necessary to avoid deadlock between
> the inter-
>               rupt handler and other threads executing on the same
> processor.
> 
> Does this mean that an interrupt that uses a simplelock must be
> prevented
> from running while code in a process context has the lock locked? If
> so,
> who is the lock meant to coordinate between? Interrupt handlers
> running
> on different CPUs? Because otherwise interrupt handlers can just
> assume
> they have the necessary locks already locked (and never bother to
> unlock
> them) -- code in a process context cannot have locks locked that may
> be
> used by interrupt handlers.

I think that any code which must create a critical section to modify
data structures that an interrupt handler may also modify must do the
following in this order on entry:

a) splwhatever() to make sure this processor will not be preempted by
the interrupt.

b) use simple_lock to prevent processes/interrupt handlers on other
cpus from executing concurrently in the same critical section

And on exit, do them in the reverse order (unlock using simple_unlock,
then splx()). This would prevent multiple cpus from executing processes
concurrently with interrupt handlers or each other in the same critical
section. As far as I can think about it, if you don't do this then you
cannot prevent an interrupt handler form concurrently modifying a
devices configuration structure, which will lead to subtle errors.
Using simplelock with lockmgr would only help the processes case unless
the interrupt handler also uses the same simplelock when it executes.
Processes must do splwhatever in order to prevent them grabbing the
lock, being preempted, and the interrupt handler executing and then
spinning waiting on the same lock.

As a side note: I notice that at least one piece of kernel code,
functions for devices like read(), write() etc. do not make any attempt
to prevent preemption from interrupts when they both modify the same
structure (I think lpt?). Why does this work (does it really work???)?


> 
> If an interrupt handler tries to (simple)lock something already
> locked by
> code in a process context, does that mean the interrupt handler would
> run
> forever spinning?

I think so. The process would have to do splwhatever. If the interrupt
is running on another cpu, then it won't hold up the process and can
spin wait on the other cpu.

> 
> What's the difference between a thread and an interrupt handler?
> Well,
> an interrupt handler is scheduled because of an interrupt (or some
> code
> in a process context requested it). Interrupt handlers do not have
> their
> own address spaces, either, and do not show up in ps. Also, interrupt
> handlers cannot do some forms of sleep. Is that about it?

I have no clue. It should probably be written as a kernel thread (as a
nice way to think about it), but it probably isn't done that way.

> 
> Can an interrupt handler schedule an interrupt handler with 
> softintr_schedule()? Should it? Is this, along with
> simple_lock_try(),
> a way to share locks between interrupts and non-interrupts safely?
> Probably not, because the interrupt might never ever actually be
> able to lock the lock?

You have to do the two step splwhatever() then simple_lock(): this is
what I gleaned from the man pages. Whether I got it right or not will
be apparent when I finally get my device to do something real like
write out data. Scheduling soft interrupts is fine: they won't be
scheduled until after the current handler exits (unless it runs at the
same priority, in which case you would have done splsoft() before it
entered anyway blocking the scheduled interrupt).

> 
> Do we have any platforms that do not __HAVE_GENERIC_SOFT_INTERRUPTS?
> If not, will we ever?

i386 doesn't have it as far as I know. I think sparc does. I don't know
if/when all archs will get it.

> 
> Sanity check to make sure I'm not on crack - this code here is wrong,
> correct?
> 
>   if (foo->bar == NULL) {
>      int l = splwhatever();
> 
>      something_operating_on_bar(foo);
> 
>      splx(l);
>   }
> 
> What's the preferred way to correct this?
> 
> #1:
> 
>   int l = splwhatever();
>   if (foo->bar == NULL) {
>      something_operating_on_bar(foo);
>   }
>   splx(l);
> 
> Or #2: 
> 
>   if (foo->bar == NULL) {
>      int l = splwhatever();
> 
>      if (foo->bar == NULL) 
>         something_operating_on_bar(foo);
> 
>      splx(l);
>   }
> 
> Choice #2 seems more efficient if foo->bar == NULL only occasionally,
> but #1 is smaller and seems less error prone. 

How about:

#3
int l = splwahtever();
simplelock(lock)
if(foo->bar == NULL) {
something_operating_on_bar(foo);
}
simplelock(unlock);
splx(l);

I would also like to know what is the correct way.....

> 
> Is this a correct way to make code SMP-safe?
> 
>   int l = splwhatever();   
> #ifdef MULTIPROCESSOR
>   simple_lock(&foo->foolock);
> #endif
>   if (foo->bar == NULL) {
>      something_operating_on_bar(foo);
>   }  
> #ifdef MULTIPROCESSOR
>   simple_unlock(&foo->foolock);
> #endif
>   splx(l);   

I think so, but we will have to wait for someone who knows what the SMP
code does and will eventually do to comment. Since processes can
preempt each other, you may need all of this even for a
uniprocessor.... Interestingly, on 1.5.x, simple_lock() doesn't
actually do anything (null macro) without LOCKDEBUG.

Additional question: can process use lockmgr and interrupt handlers use
simplelock only if the lockmgr also uses the same simplelock for its
critical sections? This would allow process to be able to sleep when
they compete for a lock and only necessitate spinning when an interrupt
is competeing?

I guess that would also like to know more about this in NetBSD and I
would appreciate some illumination on this.

> 
> Thanks for the help, and happy new year!
> -- 
> "A method for inducing cats to exercise consists of directing a beam
> of
> invisible light produced by a hand-held laser apparatus onto the
> floor ...
> in the vicinity of the cat, then moving the laser ... in an irregular
> way
> fascinating to cats,..." -- US patent 5443036, "Method of exercising
> a cat" 

______________________________________________________________________ 
Post your free ad now! http://personals.yahoo.ca