Subject: netbsd-4/sparc MP kernel panic
To: None <tech-kern@netbsd.org>
From: John D.Baker <jdbaker@mylinuxisp.com>
List: tech-kern
Date: 07/01/2007 23:08:10
I originally posted the following on port-sparc@.  Now that I have 
output
from a LOCKDEBUG kernel, it was suggested that I post it here.  First,
my original message:

--------------------
While rebuilding the system, the console of my dual hypersparc-150
SS20 reported the following:

[...]
Jun 24 11:38:08 ss20a /netbsd: nfs server halloran:/r0/d2/NetBSD: is 
alive again
Jun 24 15:38:40 ss20a /netbsd: nfs server hxcall(cpu1,0xf00087e4): 
couldn't ping cpus:panic:  cpu0cpu0: stuck on lock@f0353344

syncing disks... alloran:/r0/d2/NetBSD: not responding


The machine is running 4.0BETA_2 from around 27 March 2007 with an MP 
kernel
customized from GENERIC/GENERIC.MP, built with -mcpu=hypersparc.  System
sources are on the file server "halloran", everything else goes to local
disk on the machine "ss20a" itself.

It was doing the following at the time (from frozen SSH session):

[...]
#   install  /d1/nbsd/DEST/sparc/bin/cp
STRIP=/d1/nbsd/tools/sparc/bin/sparc--netbsdelf-strip 
/d1/nbsd/tools/sparc/bin/nbinstall -U -M /d1/nbsd/DEST/sparc/METALOG -D 
/d1/nbsd/DEST/sparc -h sha1 -N /amd/halloran/r0/d2/NetBSD/src/etc -c  
-r -o root -g wheel -m 555   cp /d1/nbsd/DEST/sparc/bin/cp
--- install-games ---
--- install-backgammon ---
--- install-usr.sbin ---
--- /d1/nbsd/DEST/sparc/usr/share/man/cat8/accton.0 ---
nfs server halloran:/r0/d2/NetBSD: not responding
--------------------

And the message I posted today with LOCKDEBUG output:

---------------------
This time around, I built and installed kernels build from the latest
netbsd-4 sources (updated late 28 June).  During the subsequent build
of the userland, I got the same panic again.

When I'd recovered from that, I built and installed kernels with
"options LOCKDEBUG".  During the restarts of the userland build, I got
the panic twice more, but with more information.

The output is below.  The first line is from the non-LOCKDEBUG kernel, 
the
subsequent two groups are from the LOCKDEBUG kernel.  The files 
referenced
live on my file server, via NFS.


[...]
xcall(cpu1,0xf00087e4): couldn't ping cpus:panic:  cpu0cpu0: stuck on 
lock@f0317274


[...]
xcall(cpu1,0xf00087e4): couldn't ping cpus:panic:  cpu0cpu0: stuck on 
lock@f0329604

syncing disks...
simple_lock: locking against myself
lock: 0xf0326d24, currently at: 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:1237
on CPU 0
last locked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/sys_generic.c:1129
last unlocked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:744

switching with held simple_lock 0xf035a588 CPU 0 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/subr_pool.c:1292

simple_lock: uninitialized lock
lock: 0xf035a588, currently at: 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/subr_pool.c:935
on CPU 1
last locked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/subr_pool.c:1292
last unlocked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/subr_pool.c:1294


[...]
xcall(cpu0,0xf00087e4): couldn't ping cpus:panic:  cpu1cpu1: stuck on 
lock@f0329604

syncing disks...
simple_lock: locking against myself
lock: 0xf0326d24, currently at: 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:1237
on CPU 1
last locked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/sys_generic.c:1129
last unlocked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:744

pool_get(PR_WAITOK) with held simple_lock 0xf5702c68 CPU 1 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/tty.c:2487

[ last message repeated 107 times ]
[ system hung ]


Subsequent attempts to finish building userland have failed, but I 
suspect
local filesystem corruption from the prior panics.  It is being 
restarted
from scratch.
------------------------

One thing I didn't mention in prior posts:  Several hours prior to the
original panic, there were a number of console messages indicating a
fault in one of the memory modules.  There was just one burst of them
and they've not reappeared since.  Maybe this is just a symptom of a
memory module going bad?

Thanks.

--
John D. Baker                            NetBSD     Darwin/MacOS X
http://mylinuxisp(dot)com/(tilde)jdbaker/     OpenBSD            FreeBSD
BSD.  It just sits there and _works_.
GPG fingerprint = D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645