netbsd-bugs: Re: kern/32162: [netbsd-3.0] kernel dead-lock in MP system

Subject: Re: kern/32162: [netbsd-3.0] kernel dead-lock in MP system
To: None <gnats-bugs@NetBSD.org>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 02/13/2006 12:12:02

On 11-Feb-06, at 10:56 AM, Andreas Wrede wrote:

> I replaced the two isp(4) with a dual-channel LSI FC929 mpt(4) card  
> on Jan 26 on suggestions that the isp(4) driver could be the cause  
> of the memory corruption. Since then, the kernel paniced 3 times  
> but always with the same "kernel debugging assertion "ph- 
> >ph_nmissing != 0" failed". Of the previous 27(!) panics, 2 were  
> also ph->ph_nmissing != 0 asserts.

Well, the fourth panic just happened and it's different ( assertion  
"(v == __SIMPLELOCK_LOCKED) || (v == __SIMPLELOCK_UNLOCKED)" failed),  
so I guess, I am still seeing random memory corruption. (Serves me  
right to believe that three occurrences of anything is a pattern)

I have now replaced the bge(4) Ethernet with a wm(4) in an attempt to  
isolate the problem.

Is it possible that the problem Simon Burge describes in  
"Initialising the same pool multiple times" (http://mail- 
index.netbsd.org/tech-kern/2006/02/13/0003.html) is causing the  
random panic I am getting?


Just in case, the full traceback of this panic:

panic: kernel debugging assertion "(v == __SIMPLELOCK_LOCKED) || (v  
== __SIMPLELOCK_UNLOCKED)" failed: file "/u1/netbsd-3.0/src/sys/arch/ 
x86/x86/lock_machdep.c", line 83
Begin traceback...
__main(c0676980,c06dc2c0,53,c06dc280,1) at netbsd:__main
__cpu_simple_lock(d1c686d0,a,1,286,c21ab000) at  
netbsd:__cpu_simple_lock+0xd5
_simple_lock(d1c686d0,c06ddc80,73b,c21ab000,d1c686d0) at  
netbsd:_simple_lock+0x7a
pmap_reference(d1c686d0,c0719d9c,52c,297,282) at netbsd:pmap_reference 
+0x1a
pmap_load(c034e56b,cd13d000,8062000,52c,d3a41558) at netbsd:pmap_load 
+0xc4
copyout(cd13d000,52c,cead7d14,282,1a000) at netbsd:copyout+0xf
ffs_read(cead7cb4,cc237d9c,10001,20001,c05a6d20) at netbsd:ffs_read 
+0x4a6
VOP_READ(cc237d9c,cead7d14,1,cc2208dc,0) at netbsd:VOP_READ+0x34
vn_rdwr(0,cc237d9c,8062000,52c,1a000) at netbsd:vn_rdwr+0xb4
vmcmd_readvn(d042b798,c2bc501c,bfc00000,0,0) at netbsd:vmcmd_readvn+0x2f
sys_execve(d3a41558,cead7f64,cead7f5c,c0718564,c07928cc) at  
netbsd:sys_execve+0x620
syscall_plain() at netbsd:syscall_plain+0x1a5
--- syscall (number 59) ---
0xbdb2b16f:
End traceback...
syncing disks... Stopped in pid 25494.1 (ps) at  netbsd:cpu_Debugger 
+0x4:        leave
db{0}> reboot 0x104
dumping to dev 18,1 offset 1051455
dump device bad
sd0(mpt0Stopped in pid 25494.1 (ps) at  netbsd:cpu_Debugger 
+0x4:        leave
db{0}> reboot
cpu0: spinout while in debugger


-- 
	aew