Subject: Re: yamt-idlelwp fallout for mips/cobalt?
To: Michael L. Hitch <mhitch@lightning.msu.montana.edu>
From: Andrew Doran <ad@NetBSD.org>
List: port-cobalt
Date: 06/14/2007 21:37:48
On Thu, Jun 14, 2007 at 11:54:39AM -0600, Michael L. Hitch wrote:

> On Fri, 25 May 2007, Izumi Tsutsui wrote:
> 
> >The following patch makes a LOCKDEBUG kernel work,
> >but I don't know if it's really correct.
> 
>   I don't think it's correct.  This was changed several years ago
> (starting with revision 1.175) because kernel threads would start
> and run with interrupts disabled.  I had the same problem with my
> amiga (m68k) not all that long ago because I had added a raidframe
> drive and was getting lots of clock skew when parity rebuilding was
> going on.  I finally figured out that the raidframe thread was running
> with interrupts blocked, and started looking at several other ports
> to see how they started kthread processes and found that they had
> the same problem, but had fixed for several years.

> >Index: arch/mips/mips/vm_machdep.c
> >===================================================================
> >RCS file: /cvsroot/src/sys/arch/mips/mips/vm_machdep.c,v
> >retrieving revision 1.117
> >diff -u -r1.117 vm_machdep.c
> >--- arch/mips/mips/vm_machdep.c	17 May 2007 14:51:25 -0000	1.117
> >+++ arch/mips/mips/vm_machdep.c	25 May 2007 14:47:42 -0000
> >@@ -170,7 +170,9 @@
> >	pcb->pcb_context[MIPS_CURLWP_CARD - 16] = (intptr_t)l2;/* S? */
> >	pcb->pcb_context[8] = (intptr_t)f;		/* SP */
> >	pcb->pcb_context[10] = (intptr_t)lwp_trampoline;/* RA */
> >+#if 0
> >	pcb->pcb_context[11] |= PSL_LOWIPL;		/* SR */
> >+#endif

I'm not yet sure what we need to set into SR in this case, but post
yamt-idlelwp, LWPs should start up at IPL_SCHED as far as MD code is
concerned. cpu_switchto() is expected to maintain the IPL at IPL_SCHED or
above across the switch. If I read the code correctly, PSL_LOWIPL enables
all interrupts.

> Looking back in the mail archives, this seems to be an attempt
> to fix a locking against myself panic, so I suspect it's more likely
> a locking error somewhere. 

The call into lwp_startup() does an spl0(). Before it does that, it also
unlocks the previous LWP if any. If we enable interrupts before unlocking
the previous LWP, we can end up taking an interrupt and trying to acquire
a spinlock that is already held.

Andrew