Subject: netbsd-4 MP hypersparc panic
To: None <port-sparc@NetBSD.org>
From: John D. Baker <jdbaker@mylinuxisp.com>
List: port-sparc
Date: 09/29/2007 11:46:32
I posted about seeing this before back in June and July:

http://mail-index.NetBSD.org/port-sparc/2007/06/24/0000.html
http://mail-index.NetBSD.org/port-sparc/2007/07/01/0001.html

I've finally been able to get back to this.  I built new kernels and
userland from netbsd-4 about 15 September (started, but had to interrupt
and finished on 20 September).  Running the old MP/LOCKDEBUG kernel and a
non-parallel build succeeded and I updated with a new MP/LOCKDEBUG kernel
and userland.

Then I updated the sources to netbsd-4 from 21 September and started
with a 8-job parallel build (-j 8).  The result was the following:

----- console panic message -----
xcall(cpu1,0xf00087e4): couldn't ping cpus:panic:  cpu0cpu0: stuck on 
lock@f034fe8c

syncing disks... simple_lock: locking against myself
lock: 0xf034d5ac, currently at: 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:1237
on CPU 0 last locked: 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/sys_generic.c:1133
last unlocked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:744

pool_get(PR_WAITOK) with held simple_lock 0xf5742c68 CPU 0 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/tty.c:2487

pool_get(PR_WAITOK) with held simple_lock 0xf5742c68 CPU 0 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/tty.c:2487

switching with held simple_lock 0xf5742c68 CPU 0 
/amd/halloran/r0/d2/NetBSD/src/sys/kern/tty.c:2487
xcall(cpu1,0xf00087e4): couldn't ping cpus:
----- end panic message -----

The files refereced are (obviously) on my file server, which the failing
machine accesses via NFS (via 'amd').

The build was doing the following at the time (from frozen SSH session):

----- build output messages -----
[...]
--- dependall-cksum ---
--- dependall ---
--- dependall-file ---
dependall ===> tools/file
--- dependall-asn1_compile ---
--- dependall ---
--- dependall-lint1 ---
--- dependall-makefs ---
dependall ===> tools/makefs
--- dependall-menuc ---
--- dependall-mkcsmapper ---
--- dependall-mkesdb ---
--- dependall-mklocale ---
--- dependall-lint1 ---
dependall ===> tools/lint1
----- end build output messages -----

A curious thing is that the power LED is still blinking (I always enable
BLINK) in my sparc kernels.  The duty cycle of the light indicates it
thinks it still has a load average of about 6.

I restarted the build with a uni-processor kernel and it had no problems
running an 8-job parallel build.

-- 
John D. Baker, KN5UKS                    NetBSD     Darwin/MacOS X
jdbaker(at)mylinuxisp(dot)com                 OpenBSD            FreeBSD
BSD -- It just sits there and _works_!
GPG fingerprint:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645