Subject: Re: yamt-idlelwp fallout for mips/cobalt?
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
From: Andrew Doran <ad@netbsd.org>
List: port-cobalt
Date: 05/23/2007 14:12:15
On Wed, May 23, 2007 at 12:00:05AM +0900, Izumi Tsutsui wrote:

> It seems a kernel with options DIAGNOSTIC still has
> some locking problem:
> ---
>  :
> Checking quotas: done.
> Setting securelevel: kern.securelevel: 0 -> 1
> Starting virecover.
> Starting local daemons:.
> Updating motd.
> Starting ntpd.
> Starting sshd.
> Mutex error: mutex_spin_retry: locking against myself
> 
> lock address : 0x00000000802dc920
> current cpu  :                  0
> current lwp  : 0x000000008f4881e0
> owner field  : 000000000000000000 wait/spin:                0/1
> 
> panic: lock error
> Stoed in pid 500.1 (sh) at    netbsd:cpu_Debugger+0x4:        jr      ra
>                 bdslot: nop
> db> tr
> 801e4168+89c (8fffe000,802b6b90,d,0) ra 8017c658 sz 0
> 8017c4d0+188 (8fffe000,802b6b90,d,0) ra 0 sz 0
> User-level: pid 500.1
> db> 
> ---

That could be mutex_spin_enter/exit in lock_stubs.S.

> With options LOCKDEBUG:
> ---
> Setting date via ntp.
> Starting rpcbind.
> Starting ypbind.
> Mounting all filesystems...
> Mutex error: lockdebug_wantlock: locking against myself
> 
> lock address : 0x00000000802f35c0 type     :               spin
> shared holds :                  0 exclusive:                  1
> shares wanted:                  0 exclusive:                  1
> current cpu  :                  0 last held:                  0
> current lwp  : 0x000000008fc95000 last held: 0x000000008fc95700
> last locked  : 0x0000000080171dbc unlocked : 0x000000008016b7d0
> owner field  : 000000000000000000 wait/spin:                0/1
> 
> panic: LOCKDEBUG
> Stopped in pid 249.1 (nfsio) at netbsd:cpu_Debugger+0x4:        jr      ra
>                 bdslot: nop
> db> 
> ---
> 0x80171dbc is in ltsleep() and 0x8016b7d0 is in sleepq_wake().

I wonder, is this is from an interrupt handler and are the masks are being
set up properly? E.g. IPL_STATCLOCK should also block IPL_CLOCK.. I'll take
a look when I have a moment. Another possibility is that cpu_switchto is
returning the wrong value, but it does work on MIPS1 and most of the code
is common.

Andrew