tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bug in softint_execute() ?



On Fri, Apr 11, 2008 at 12:23:26AM +0100, Andrew Doran wrote:
> On Thu, Apr 10, 2008 at 03:54:55PM -0700, Tim Rightnour wrote:
> 
> > I'm hitting the following diagnostic panic in softint_execute() on 
> > relatively
> > recent HEAD, on my ofppc SMP box:
> > 
> > kernel diagnostic assertion "si->si_cpu == curcpu()" failed: file
> > "/usr/src/hackathon/sys/kern/kern_softint.c", line 524
> > 
> > 0xaa2dfe00: at panic+0x228
> > 0xaa2dfe50: at __kernassert+0x4c
> > 0xaa2dfe60: at softint_overlay+0x8a0
> > 0xaa2dfec0: at lwp_userret+0x168
> > 0xaa2dfee0: at syscall_plain+0x2b4
> > 0xaa2dff40: user SC trap #74 by 0xefffa93c: srr1=0xd032
> >             r1=0xffffcfe0 cr=0x44000048 xer=0x20000000 ctr=0
> > 
> > A few printfs thrown in the code, before eachof the KASSERTS at the 
> > top/bottom
> > of that function show:
> > 
> > si_cpu = 0x510838 cur=0x510838
> > si_cpu = 0x510838 cur=0x511540
> > 
> > And, just FYI:
> > 
> > cpu2 started ci=0x5110e8
> > cpu3 started ci=0x511540
> > cpu1 started ci=0x510c90
> > 
> > So obviously somehow the curcpu is changing in that function.  Maybe after 
> > the
> > lock is released?
> 
> It's a bound kthread and should never migrate to another CPU. Are you using
> SCHED_M2? Having read the code recently I don't know how that could happen.
> Could you add KASSERTs on entry and exit that check for LW_BOUND set in
> l_flag? It may also be a good idea to add assertions that check curlwp is
> set correctly (IIRC, matches si->si_lwp).

Looking at the trace above, it occurred to me that softint_overlay() (part
of the slow path code) is hijacking a user LWP and so it's very unlikely
to be bound to a CPU, let alone bound to the correct one.

It's not possible to simply OR the LW_BOUND flag into l_flag because that
would require locking curlwp twice on every soft interrupt, which is too
expensive. I think we could move the bound flag into l_pflag, the "thread
private" flag word. LW_BOUND is only modified or inspected by curlwp, or
when the LWP is known to be in a quiescent state, eg being created or awoken
from sleep. So there is no danger of modifications being lost / out of sync.

Andrew


Home | Main Index | Thread Index | Old Index