Subject: Re: kern/32162: [netbsd-3.0] kernel dead-lock in MP system
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 12/07/2005 16:10:02
The following reply was made to PR kern/32162; it has been noted by GNATS.

From: Andreas Wrede <andreas@planix.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
	gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/32162: [netbsd-3.0] kernel dead-lock in MP system
Date: Wed, 7 Dec 2005 11:08:46 -0500

 --Apple-Mail-5-1045703809
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
 
 Running with a kernel with DIAGNOSTIC, LOCKDEBUG and DEBUG turned on  
 produced two panics over the last week:
 
 Nov 30
 
 panic: kernel debugging assertion "(v == __SIMPLELOCK_LOCKED) || (v  
 == __SIMPLELOCK_UNLOCKED)" failed: file "/u1/netbsd-3.0/src/sys/arch/ 
 x86/x86/lock_machdep.c",
 Begin traceback...
 __main(c07458f7,c07bbe60,53,c07bbe20,1) at netbsd:__main
 __cpu_simple_lock(d0734268,c22ac800,1,286,c22ac800) at  
 netbsd:__cpu_simple_lock+0xd5
 _simple_lock(d0734268,c07bd480,73b,c22ac800,d0734268) at  
 netbsd:_simple_lock+0x7a
 pmap_reference(d0734268,c080207c,52c,297,282) at netbsd:pmap_reference 
 +0x1a
 pmap_load(c03aa14f,cd042000,8062000,52c,cea3f29c) at netbsd:pmap_load 
 +0xc4
 copyout(cd042000,52c,ce48bd14,282,1a000) at netbsd:copyout+0xf
 ffs_read(ce48bcb4,cc317ae4,10001,20001,c063e660) at netbsd:ffs_read 
 +0x4a6
 VOP_READ(cc317ae4,ce48bd14,1,cc300804,0) at netbsd:VOP_READ+0x34
 vn_rdwr(0,cc317ae4,8062000,52c,1a000) at netbsd:vn_rdwr+0xb4
 vmcmd_readvn(cf07caec,c2bc8a1c,bfc00000,0,0) at netbsd:vmcmd_readvn+0x2f
 sys_execve(cea3f29c,ce48bf64,ce48bf5c,c08008a4,282) at  
 netbsd:sys_execve+0x620
 syscall_plain() at netbsd:syscall_plain+0x1a5
 --- syscall (number 59) ---
 0xbdb2b13f:
 End traceback...
 syncing disks... 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 giving up
 Printing vnodes for busy buffers
 tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,  
 flags (0<VLOCKSWORK>)
      tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
      mode 060640, owner 0, group 5, size 0 not locked
 tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,  
 flags (0<VLOCKSWORK>)
      tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
      mode 060640, owner 0, group 5, size 0 not locked
 tag VT_UFS(1) type VDIR(2), usecount 0, writecount 0, refcount 1,  
 flags (0<VLOCKSWORK>)
      tag VT_UFS, ino 43052, on dev 0, 4 flags 0x0, effnlink 3, nlink 3
      mode 040755, owner 110, group 202, size 512 not locked
 tag VT_UFS(1) type VDIR(2), usecount 0, writecount 0, refcount 1,  
 flags (0<VLOCKSWORK>)
      tag VT_UFS, ino 129044, on dev 0, 4 flags 0x0, effnlink 2, nlink 2
      mode 040700, owner 0, group 0, size 1536 not locked
 tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,  
 flags (0<VLOCKSWORK>)
      tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
      mode 060640, owner 0, group 5, size 0 not locked
 tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,  
 flags (0<VLOCKSWORK>)
      tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
      mode 060640, owner 0, group 5, size 0 not locked
 tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,  
 flags (0<VLOCKSWORK>)
      tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
      mode 060640, owner 0, group 5, size 0 not locked
 giving up
 
 
 
 Dec 4: Note the second panic during the syncing disks phase of the  
 reboot after (during ?) the first panic. Trying to "call  
 simple_lock_dump" locks up the machine.
 
 
 panic: kernel diagnostic assertion "vm_map_pmap(map) == pmap_kernel 
 ()" failed: file "/u1/netbsd-3.0/src/sys/uvm/uvm_map.c", line 4151
 Begin traceback...
 __main(c073fd1d,c07b8e60,1037,c07b89e0,cc317b6c) at netbsd:__main
 uvm_kmapent_alloc(d6b4c2a0,0,0,c0869ec0,0) at netbsd:uvm_kmapent_alloc 
 +0x30b
 uvm_mapent_reserve(d6b4c2a0,cd3abd44,2,0,0) at  
 netbsd:uvm_mapent_reserve+0x54
 uvm_unmap1(d6b4c2a0,0,bfc00000,0,c0869ec0) at netbsd:uvm_unmap1+0x1b
 uvm_deallocate(d6b4c2a0,0,bfc00000,0,0) at netbsd:uvm_deallocate+0x32
 sys_execve(d025ab7c,cd3abf64,cd3abf5c,c08008a4,c039ade7) at  
 netbsd:sys_execve+0xbd9
 syscall_plain() at netbsd:syscall_plain+0x1a5
 --- syscall (number 59) ---
 0xbdb2b13f:
 End traceback...
 syncing disks... panic: kernel diagnostic assertion "pmap->pm_pdirpa  
 == rcr3()" failed: file "/u1/netbsd-3.0/src/sys/arch/i386/i386/pm
 Begin traceback...
 __main(c073fd1d,c07bd480,867,c0758ea8,c22ac800) at netbsd:__main
 pmap_deactivate2(d025ab7c,ce7bd78c,0,0,c03accb9) at  
 netbsd:pmap_deactivate2+0x63
 mpidle(d025ab7c,0,33e,c,0) at netbsd:mpidle+0x92
 preempt(1,c07b4220,4e8,cd3ab994,c1788b10) at netbsd:preempt+0x75
 genfs_putpages(cd3aba14,1312d00,0,0,c063f020) at netbsd:genfs_putpages 
 +0x7ec
 VOP_PUTPAGES(cf4177b0,0,0,0,0) at netbsd:VOP_PUTPAGES+0x40u0:f  
 fsspi_fnoulutl_f
 Stopped in pid 25812.1 (ps) at  netbsd:cpu_Debugger+0x4:        leave
 db{0}> trace
 cpu_Debugger(c0757a97,0,8,283,c08056e0) at netbsd:cpu_Debugger+0x4
 __cpu_simple_lock(c0802694,989680,0,202,c086e1a8) at  
 netbsd:__cpu_simple_lock+0x93
 _simple_lock(c0802694,c07afc40,2b7,c080e0a0,c2d4ca5c) at  
 netbsd:_simple_lock+0x7a
 wakeup(c086e1a0,c07ba940,117,c2d4c9dc,c2d4c9e4) at netbsd:wakeup+0x55
 uvm_aio_biodone(c2d4c9dc,c07b2f60,57c,282,c2372dd8) at  
 netbsd:uvm_aio_biodone+0x56
 biodone(c2d4c9dc,0,ce957a18,297,c0845968) at netbsd:biodone+0x134
 scsipi_complete(c310e038,c22c4000,ce957a58,246,c310e048) at  
 netbsd:scsipi_complete+0x159
 scsipi_done(c310e038,2de,c074683a,8020,c0847a20) at netbsd:scsipi_done 
 +0x19a
 isp_parse_async(c22c4000,8020,0,0,0) at netbsd:isp_parse_async+0x119
 isp_intr(c22c4000,8,1,8020,c086a00c) at netbsd:isp_intr+0x1169
 isp_pci_intr(c22c4000,10,10,c,0) at netbsd:isp_pci_intr+0x6b
 intr_biglock_wrapper(c22e3f80,5,10,30,c0450010) at  
 netbsd:intr_biglock_wrapper+0x18
 Xintr_ioapic_level5() at netbsd:Xintr_ioapic_level5+0xa0
 --- interrupt ---
 Xspllower(5,c07ad7c0,585,246,0) at netbsd:Xspllower+0xe
 _kernel_lock(42,c060f000,cc77d440,c2d2da00,c22d8000) at  
 netbsd:_kernel_lock+0xfd
 x86_softintlock(0,c0802694,4,ce957e68,c039ab79) at  
 netbsd:x86_softintlock+0xd
 DDB lost frame for netbsd:Xsoftnet+0x18, trying 0xce957e2c
 Xsoftnet() at netbsd:Xsoftnet+0x18
 --- interrupt ---
 0xce957e98:
 db{0}> call simple_lock_dump
 cpu0: spinout while in debugger
 
 Here the machines locks up and needs a hard reset.
 
 
 On Nov 26, 2005, at 18:08 , Manuel Bouyer wrote:
 
 > On Sat, Nov 26, 2005 at 05:18:40PM -0500, Andreas Wrede wrote:
 >>
 >> On Nov 26, 2005, at 15:29 , Manuel Bouyer wrote:
 >>
 >>> On Fri, Nov 25, 2005 at 03:13:00AM +0000, Andreas Wrede wrote:
 >>>>> Environment:
 >>>> 	
 >>>> 	
 >>>> System: NetBSD whome.planix.com 3.0_RC3 NetBSD 3.0_RC3
 >>>> (PLANIX.MPACPI) #0: Thu Nov 24 20:57:09 EST 2005
 >>>> root@whome.planix.com:/u1/netbsd-3.0/src/sys/arch/i386/compile/
 >>>> obj.i386/PLANIX.MPACPI i386
 >>>> Architecture: i386
 >>>> Machine: i386
 >>>>> Description:
 >>>> 	Over the last week I have experienced 3 kernel dead-locks on a
 >>>> NetBSD 3.0_RC1/2/3 system.
 >>>> The motherboard is a Tylan K8S Pro S2882G3NR with 2 AMD Opteron
 >>>> 244 CPUs installed. The kernel
 >>>> is differs from GENERIC.MPACPI in the value for some SYSVSEM
 >>>> variables, maxusers and some
 >>>> other variables.
 >>>
 >>> Can you try a kernel with DIAGNOSTIC, DEBUG and LOCKDEBUG ?
 >>
 >> Right now, I am running with LOCKDEBUG. I will add DIAGNOSTIC and  
 >> DEBUG.
 >
 > Yes, if you have the problem I'm thinking about, it will only be
 > detected if you have DIAGNOSTIC. But LOCKDEBUG and DEBUG can't hurt,
 > maybe these will catch something else.
 >
 >>
 >> Not knowing much about kernel debugging, and since creating a core
 >> dump is not possible,
 >
 > Why ? Have you tried reboot(0x104) ?
 >
 >> what commands should I run the next time the
 >> dead-lock occurs?
 >
 > I can't see at anything more than what you have provided for now ...
 >
 > -- 
 > Manuel Bouyer <bouyer@antioche.eu.org>
 >      NetBSD: 26 ans d'experience feront toujours la difference
 > --
 >
 
 -- 
      aew
 
 
 --Apple-Mail-5-1045703809
 content-type: application/pgp-signature; x-mac-type=70674453;
 	name=PGP.sig
 content-description: This is a digitally signed message part
 content-disposition: inline; filename=PGP.sig
 content-transfer-encoding: 7bit
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.1 (Darwin)
 
 iD8DBQFDlwkUEh/h9J/TQyERAiMCAKDIlC4Rh6YfQm5Jb7n3fic/CiJJmwCffe3H
 BOCIjKQUXcWbr2eiFpO3A2g=
 =5yXZ
 -----END PGP SIGNATURE-----
 
 --Apple-Mail-5-1045703809--