Subject: Re: kern/32162: [netbsd-3.0] kernel dead-lock in MP system
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 02/13/2006 17:15:03
The following reply was made to PR kern/32162; it has been noted by GNATS.

From: Andreas Wrede <andreas@planix.com>
To: gnats-bugs@NetBSD.org
Cc: Jason Thorpe <thorpej@shagadelic.org>,
	Manuel Bouyer <bouyer@antioche.eu.org>, kern-bug-people@NetBSD.org,
	gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
	Chuck Silvers <chuq@chuq.com>, Simon Burge <simonb@wasabisystems.com>
Subject: Re: kern/32162: [netbsd-3.0] kernel dead-lock in MP system
Date: Mon, 13 Feb 2006 12:12:02 -0500

 On 11-Feb-06, at 10:56 AM, Andreas Wrede wrote:
 
 > I replaced the two isp(4) with a dual-channel LSI FC929 mpt(4) card  
 > on Jan 26 on suggestions that the isp(4) driver could be the cause  
 > of the memory corruption. Since then, the kernel paniced 3 times  
 > but always with the same "kernel debugging assertion "ph- 
 > >ph_nmissing != 0" failed". Of the previous 27(!) panics, 2 were  
 > also ph->ph_nmissing != 0 asserts.
 
 Well, the fourth panic just happened and it's different ( assertion  
 "(v == __SIMPLELOCK_LOCKED) || (v == __SIMPLELOCK_UNLOCKED)" failed),  
 so I guess, I am still seeing random memory corruption. (Serves me  
 right to believe that three occurrences of anything is a pattern)
 
 I have now replaced the bge(4) Ethernet with a wm(4) in an attempt to  
 isolate the problem.
 
 Is it possible that the problem Simon Burge describes in  
 "Initialising the same pool multiple times" (http://mail- 
 index.netbsd.org/tech-kern/2006/02/13/0003.html) is causing the  
 random panic I am getting?
 
 
 Just in case, the full traceback of this panic:
 
 panic: kernel debugging assertion "(v == __SIMPLELOCK_LOCKED) || (v  
 == __SIMPLELOCK_UNLOCKED)" failed: file "/u1/netbsd-3.0/src/sys/arch/ 
 x86/x86/lock_machdep.c", line 83
 Begin traceback...
 __main(c0676980,c06dc2c0,53,c06dc280,1) at netbsd:__main
 __cpu_simple_lock(d1c686d0,a,1,286,c21ab000) at  
 netbsd:__cpu_simple_lock+0xd5
 _simple_lock(d1c686d0,c06ddc80,73b,c21ab000,d1c686d0) at  
 netbsd:_simple_lock+0x7a
 pmap_reference(d1c686d0,c0719d9c,52c,297,282) at netbsd:pmap_reference 
 +0x1a
 pmap_load(c034e56b,cd13d000,8062000,52c,d3a41558) at netbsd:pmap_load 
 +0xc4
 copyout(cd13d000,52c,cead7d14,282,1a000) at netbsd:copyout+0xf
 ffs_read(cead7cb4,cc237d9c,10001,20001,c05a6d20) at netbsd:ffs_read 
 +0x4a6
 VOP_READ(cc237d9c,cead7d14,1,cc2208dc,0) at netbsd:VOP_READ+0x34
 vn_rdwr(0,cc237d9c,8062000,52c,1a000) at netbsd:vn_rdwr+0xb4
 vmcmd_readvn(d042b798,c2bc501c,bfc00000,0,0) at netbsd:vmcmd_readvn+0x2f
 sys_execve(d3a41558,cead7f64,cead7f5c,c0718564,c07928cc) at  
 netbsd:sys_execve+0x620
 syscall_plain() at netbsd:syscall_plain+0x1a5
 --- syscall (number 59) ---
 0xbdb2b16f:
 End traceback...
 syncing disks... Stopped in pid 25494.1 (ps) at  netbsd:cpu_Debugger 
 +0x4:        leave
 db{0}> reboot 0x104
 dumping to dev 18,1 offset 1051455
 dump device bad
 sd0(mpt0Stopped in pid 25494.1 (ps) at  netbsd:cpu_Debugger 
 +0x4:        leave
 db{0}> reboot
 cpu0: spinout while in debugger
 
 
 -- 
 	aew