Subject: Re: kern/32162: [netbsd-3.0] kernel dead-lock in MP system
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 01/12/2006 03:10:03
The following reply was made to PR kern/32162; it has been noted by GNATS.

From: Andreas Wrede <andreas@planix.com>
To: Jason Thorpe <thorpej@shagadelic.org>
Cc: Manuel Bouyer <bouyer@antioche.eu.org>, gnats-bugs@NetBSD.org,
	kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: kern/32162: [netbsd-3.0] kernel dead-lock in MP system
Date: Wed, 11 Jan 2006 22:07:55 -0500

 --Apple-Mail-2--185714117
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
 
 
 On Dec 7, 2005, at 11:42 , Jason Thorpe wrote:
 
 >
 > On Dec 7, 2005, at 8:08 AM, Andreas Wrede wrote:
 >
 >> Running with a kernel with DIAGNOSTIC, LOCKDEBUG and DEBUG turned  
 >> on produced two panics over the last week:
 >>
 >> Nov 30
 >>
 >> panic: kernel debugging assertion "(v == __SIMPLELOCK_LOCKED) ||  
 >> (v == __SIMPLELOCK_UNLOCKED)" failed: file "/u1/netbsd-3.0/src/sys/ 
 >> arch/x86/x86/lock_machdep.c",
 >
 > This very likely means that something is smashing memory.
 
 I experienced a number of different panics/asserts, which probably  
 confirms your assessment of memory corruption somewhere.
 
 Now the system has been moved to new hardware, a TYAN S2892 K8SE  
 motherboard with two Opteron 246 processors (see dmesg below). Two  
 changes on the software side: The 1TB filesystem on the previous  
 hardware was a UFS2 fs, now it's again a UFS1 and the other  
 filesystem are now on a RAID1 set.  I am still running a DIAGNOSTIC,  
 LOCKDEBUG and DEBUG and after an uptime of almost 4 days, I get   
 'cpu0: spinout'.
 
 Any idea what this is and what to do next?
 
 cpu0: spinout
 Stopped in pid 18.1 (aiodoned) at       netbsd:cpu_Debugger 
 +0x4:        leave
 db{0}> bt
 cpu_Debugger(c067db95,0,c2bf7cdc,cd262b7c,c0713354) at  
 netbsd:cpu_Debugger+0x4
 __cpu_simple_lock(c0713a1c,0,c2993df4,c36b6100,cd262b7c) at  
 netbsd:__cpu_simple_lock+0x93
 printf_nolog(c06769f5,cd262b7c,c069ace4,cd262c30,a) at  
 netbsd:printf_nolog+0x32
 lock_printf(c069ace4,a080101,cd262c54,c033d966,c3874200) at  
 netbsd:lock_printf+0x4c
 _simple_lock(c0713354,c06cb1c0,440,c,c2b360c0) at netbsd:_simple_lock 
 +0x235
 selwakeup(c2b360c8,989680,0,66,cc2016c0) at netbsd:selwakeup+0x99
 ptsstart(cc2016c0,0,c06cb840,7,cc2016c0) at netbsd:ptsstart+0x85
 ttstart(cc2016c0,cc2016c0,9ac,1,0) at netbsd:ttstart+0x1e
 tputchar(66,7,cc2016c0,cc2016c0,cbfbb024) at netbsd:tputchar+0x79
 putchar(66,5,0,6,0) at netbsd:putchar+0x49
 kprintf(c067f495,5,0,0,cd262de0) at netbsd:kprintf+0x5c
 printf(c067f495,c067f385,cd262e60,1,1ebd4) at netbsd:printf+0x46
 trap() at netbsd:trap+0x106
 --- trap (number 6) ---
 pmap_activate(cc20b8c4,cc2101d0,4c,0,c034cbf9) at netbsd:pmap_activate 
 +0x39
 mpidle(cc20b8c4,0,1d9,c0786ff0,c0789608) at netbsd:mpidle+0xcb
 ltsleep(c0789600,204,c067744e,0,c0789608) at netbsd:ltsleep+0x4d0
 uvm_aiodone_daemon(cc20b8c4,842000,84b000,0,c0100321) at  
 netbsd:uvm_aiodone_daemon+0x15f
 db{0}> machine cpu 1
 using CPU 1
 db{0}> bt
 __cpu_simple_lock(c0713354,c035241b,c07620a8,297,10b) at  
 netbsd:__cpu_simple_lock+0x6f
 _simple_lock(c0713354,c06c8d00,47b,c,c21a303c) at netbsd:_simple_lock 
 +0x7a
 schedclock(cf1859e8,c2266a00,c355a788,c21c4838,ce453c78) at  
 netbsd:schedclock+0x58
 statclock(ce453cbc,c01f7f52,c21c4800,80,c066ea25) at netbsd:statclock 
 +0xeb
 hardclock(ce453cbc,3,c03902b8,ce453cb4,0) at netbsd:hardclock+0x5f3
 lapic_clockintr(0,0,c0330010,30,1310010) at netbsd:lapic_clockintr+0x48
 Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x1b
 --- interrupt ---
 Xspllower(0,c06c6880,390,206,0) at netbsd:Xspllower+0xe
 _lockmgr(c0764180,400006,0,c06d69c0,da4) at netbsd:_lockmgr+0x250
 pmap_enter(cfcabc28,81fe000,37762000,3,22) at netbsd:pmap_enter+0x4d6
 uvm_fault(ce8ec2b4,81fe000,0,2,2) at netbsd:uvm_fault+0x976
 trap() at netbsd:trap+0x36f
 --- trap (number 6) ---
 0xbdb685fc:
 db{0}> sync
 cpu1: spinout while in debugger
 
 
 
 Dmesg output:
 
 NetBSD 3.0_STABLE (PLANIX.MP) #0: Sat Jan  7 10:19:22 EST 2006
          root@whome.planix.com:/u1/netbsd-3.0/obj.i386/sys/arch/i386/ 
 compile/PLANIX.MP
 total memory = 1022 MB
 avail memory = 982 MB
 BIOS32 rev. 0 found at 0xfd5c0
 mainbus0 (root)
 mainbus0: Intel MP Specification (Version 1.4) (AMD      HAMMER      )
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: AMD Unknown K7 (Athlon) (686-class), 2009.32 MHz, id 0xf5a
 cpu0: features 78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
 cpu0: features 78bfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,MMX>
 cpu0: features 78bfbff<FXSR,SSE,SSE2>
 cpu0: "AMD Opteron(tm) Processor 246"
 cpu0: calibrating local timer
 cpu0: apic clock running at 200 MHz
 cpu1 at mainbus0: apid 1 (application processor)
 cpu1: starting
 cpu1: AMD Unknown K7 (Athlon) (686-class), 2009.26 MHz, id 0xf5a
 cpu1: features 78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
 cpu1: features 78bfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,MMX>
 cpu1: features 78bfbff<FXSR,SSE,SSE2>
 cpu1: "AMD Opteron(tm) Processor 246"
 mpbios: bus 0 is type PCI
 mpbios: bus 1 is type PCI
 mpbios: bus 2 is type PCI
 mpbios: bus 3 is type PCI
 mpbios: bus 8 is type PCI
 mpbios: bus 9 is type PCI
 mpbios: bus 10 is type PCI
 mpbios: bus 11 is type ISA
 ioapic0 at mainbus0 apid 2 (I/O APIC)
 ioapic0: pa 0xfec00000, version 11, 24 pins
 ioapic1 at mainbus0 apid 3 (I/O APIC)
 ioapic1: pa 0xdf200000, version 11, 4 pins
 ioapic2 at mainbus0 apid 4 (I/O APIC)
 ioapic2: pa 0xdf201000, version 11, 4 pins
 pci0 at mainbus0 bus 0: configuration mode 1
 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
 Nvidia product 0x005e (miscellaneous memory, revision 0xa3) at pci0  
 dev 0 function 0 not configured
 pcib0 at pci0 dev 1 function 0
 pcib0: Nvidia product 0x0051 (rev. 0xa3)
 Nvidia nForce4 SMBus (SMBus serial bus, revision 0xa2) at pci0 dev 1  
 function 1 not configured
 ohci0 at pci0 dev 2 function 0: Nvidia product 0x005a (rev. 0xa2)
 ohci0: interrupting at ioapic0 pin 10 (irq 10)
 ohci0: OHCI version 1.0, legacy support
 usb0 at ohci0: USB revision 1.0
 uhub0 at usb0
 uhub0: Nvidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub0: 10 ports with 10 removable, self powered
 ehci0 at pci0 dev 2 function 1: Nvidia product 0x005b (rev. 0xa3)
 ehci0: interrupting at ioapic0 pin 11 (irq 11)
 ehci0: BIOS has given up ownership
 ehci0: EHCI version 1.0
 ehci0: companion controller, 4 ports each: ohci0
 usb1 at ehci0: USB revision 2.0
 uhub1 at usb1
 uhub1: Nvidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
 uhub1: single transaction translator
 uhub1: 10 ports with 10 removable, self powered
 viaide0 at pci0 dev 6 function 0
 viaide0: NVIDIA nForce4 IDE Controller (rev. 0xf2)
 viaide0: bus-master DMA support present
 viaide0: primary channel configured to compatibility mode
 viaide0: primary channel ignored (disabled)
 viaide0: secondary channel configured to compatibility mode
 viaide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
 atabus0 at viaide0 channel 1
 viaide1 at pci0 dev 7 function 0
 viaide1: NVIDIA nForce4 Serial ATA Controller (rev. 0xf3)
 viaide1: bus-master DMA support present
 viaide1: primary channel wired to native-PCI mode
 viaide1: using ioapic0 pin 10 (irq 10) for native-PCI interrupt
 atabus1 at viaide1 channel 0
 viaide1: secondary channel wired to native-PCI mode
 atabus2 at viaide1 channel 1
 viaide2 at pci0 dev 8 function 0
 viaide2: NVIDIA nForce4 Serial ATA Controller (rev. 0xf3)
 viaide2: bus-master DMA support present
 viaide2: primary channel wired to native-PCI mode
 viaide2: using ioapic0 pin 11 (irq 11) for native-PCI interrupt
 atabus3 at viaide2 channel 0
 viaide2: secondary channel wired to native-PCI mode
 atabus4 at viaide2 channel 1
 ppb0 at pci0 dev 9 function 0: Nvidia product 0x005c (rev. 0xa2)
 pci1 at ppb0 bus 1
 pci1: i/o space, memory space enabled
 isp0 at pci1 dev 4 function 0: QLogic FC-AL and Fabric HBA
 isp0: interrupting at ioapic0 pin 11 (irq 11)
 scsibus0 at isp0: 256 targets, 8 luns per target
 vga1 at pci1 dev 6 function 0: ATI Technologies Rage XL (rev. 0x27)
 wsdisplay0 at vga1 kbdmux 1
 wsmux1: connecting to wsdisplay0
 fxp0 at pci1 dev 8 function 0: i82550 Ethernet, rev 16
 fxp0: interrupting at ioapic0 pin 10 (irq 10)
 fxp0: Ethernet address 00:e0:81:30:d6:0a
 inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 ppb1 at pci0 dev 13 function 0: Nvidia product 0x005d (rev. 0xa3)
 pci2 at ppb1 bus 2
 pci2: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb2 at pci0 dev 14 function 0: Nvidia product 0x005d (rev. 0xa3)
 pci3 at ppb2 bus 3
 pci3: i/o space, memory space enabled, rd/line, wr/inv ok
 pchb0 at pci0 dev 24 function 0
 pchb0: Advanced Micro Devices AMD64 HyperTransport configuration  
 (rev. 0x00)
 pchb1 at pci0 dev 24 function 1
 pchb1: Advanced Micro Devices AMD64 Address Map configuration (rev.  
 0x00)
 pchb2 at pci0 dev 24 function 2
 pchb2: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
 pchb3 at pci0 dev 24 function 3
 pchb3: Advanced Micro Devices AMD64 Miscellaneous configuration (rev.  
 0x00)
 pchb4 at pci0 dev 25 function 0
 pchb4: Advanced Micro Devices AMD64 HyperTransport configuration  
 (rev. 0x00)
 pchb5 at pci0 dev 25 function 1
 pchb5: Advanced Micro Devices AMD64 Address Map configuration (rev.  
 0x00)
 pchb6 at pci0 dev 25 function 2
 pchb6: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
 pchb7 at pci0 dev 25 function 3
 pchb7: Advanced Micro Devices AMD64 Miscellaneous configuration (rev.  
 0x00)
 isa0 at pcib0
 lpt0 at isa0 port 0x378-0x37b irq 7
 com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
 com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
 com1: console
 pckbc0 at isa0 port 0x60-0x64
 pckbdprobe: reset error 5
 pmsprobe: reset error 5
 lm0 at isa0 port 0x290-0x297: W83627HF
 pcppi0 at isa0 port 0x61
 midi0 at pcppi0: PC speaker
 sysbeep0 at pcppi0
 isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
 npx0 at isa0 port 0xf0-0xff: using exception 16
 fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
 isapnp0: no ISA Plug 'n Play devices found
 pci4 at mainbus0 bus 8
 pci4: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
 ppb3 at pci4 dev 10 function 0: Advanced Micro Devices AMD8131 PCI-X  
 Tunnel (rev. 0x12)
 pci5 at ppb3 bus 9
 pci5: i/o space, memory space enabled
 Advanced Micro Devices AMD8131 IO Apic (interrupt system, interface  
 0x10, revision 0x01) at pci4 dev 10 function 1 not configured
 ppb4 at pci4 dev 11 function 0: Advanced Micro Devices AMD8131 PCI-X  
 Tunnel (rev. 0x12)
 pci6 at ppb4 bus 10
 pci6: i/o space, memory space enabled
 isp1 at pci6 dev 3 function 0: QLogic FC-AL and Fabric HBA
 isp1: interrupting at ioapic2 pin 2 (irq 10)
 scsibus1 at isp1: 256 targets, 8 luns per target
 bge0 at pci6 dev 9 function 0: Broadcom BCM5704C Dual Gigabit Ethernet
 bge0: interrupting at ioapic2 pin 0 (irq 11)
 bge0: ASIC BCM5704 A3 (0x2003), Ethernet address 00:e0:81:30:d6:7c
 brgphy0 at bge0 phy 1: BCM5704 1000BASE-T media interface, rev. 0
 brgphy0: using BCM5704 DSP patch
 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,  
 1000baseT-FDX, auto
 bge1 at pci6 dev 9 function 1: Broadcom BCM5704C Dual Gigabit Ethernet
 bge1: interrupting at ioapic2 pin 1 (irq 10)
 bge1: ASIC BCM5704 A3 (0x2003), Ethernet address 00:e0:81:30:d6:7d
 brgphy1 at bge1 phy 1: BCM5704 1000BASE-T media interface, rev. 0
 brgphy1: using BCM5704 DSP patch
 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,  
 1000baseT-FDX, auto
 Advanced Micro Devices AMD8131 IO Apic (interrupt system, interface  
 0x10, revision 0x01) at pci4 dev 11 function 1 not configured
 ioapic0: enabling
 ioapic1: enabling
 ioapic2: enabling
 fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
 raidattach: Asked for 8 units
 Kernelized RAIDframe activated
 IPsec: Initialized Security Association Processing.
 scsibus0: waiting 2 seconds for devices to settle...
 scsibus1: waiting 2 seconds for devices to settle...
 atapibus0 at atabus0: 2 targets
 cd0 at atapibus0 drive 0: <HL-DT-STDVD-ROM GDR8164B, , 0L06> cdrom  
 removable
 cd0: 32-bit data port
 cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
 cd0(viaide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33)  
 (using DMA)
 sd0 at scsibus0 target 0 lun 0: <APPLE, Xserve RAID, 1.26> disk fixed
 sd0: 1035 GB, 132522 cyl, 128 head, 128 sec, 512 bytes/sect x  
 2171240448 sectors
 sd1 at scsibus1 target 0 lun 0: <APPLE, Xserve RAID, 1.26> disk fixed
 sd1: 1035 GB, 132522 cyl, 128 head, 128 sec, 512 bytes/sect x  
 2171240448 sectors
 wd0 at atabus1 drive 0: <WDC WD1600JS-22MHB0>
 wd0: drive supports 16-sector PIO transfers, LBA48 addressing
 wd0: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808  
 sectors
 wd0: 32-bit data port
 wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
 wd0(viaide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133)  
 (using DMA)
 wd1 at atabus3 drive 0: <WDC WD1600JS-00MHB0>
 wd1: drive supports 16-sector PIO transfers, LBA48 addressing
 wd1: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808  
 sectors
 wd1: 32-bit data port
 wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
 wd1(viaide2:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133)  
 (using DMA)
 Searching for RAID components...
 Component on: wd0a: 312581745
     Row: 0 Column: 0 Num Rows: 1 Num Columns: 2
     Version: 2 Serial Number: 20051218 Mod Counter: 156
     Clean: No Status: 0
     sectPerSU: 128 SUsPerPU: 1 SUsPerRU: 1
     RAID Level: 1  blocksize: 512 numBlocks: 312581632
     Autoconfig: Yes
     Contains root partition: Yes
     Last configured as: raid0
 Component on: wd1a: 312581745
     Row: 0 Column: 1 Num Rows: 1 Num Columns: 2
     Version: 2 Serial Number: 20051218 Mod Counter: 156
     Clean: No Status: 0
     sectPerSU: 128 SUsPerPU: 1 SUsPerRU: 1
     RAID Level: 1  blocksize: 512 numBlocks: 312581632
     Autoconfig: Yes
     Contains root partition: Yes
     Last configured as: raid0
 Found: wd0a at 0
 Found: wd1a at 1
 RAID autoconfigure
 Configuring raid0:
 Starting autoconfiguration of RAID set...
 Looking for 0 in autoconfig
 Found: wd0a at 0
 Looking for 1 in autoconfig
 Found: wd1a at 1
 raid0: allocating 20 buffers of 65536 bytes.
 raid0: RAID Level 1
 raid0: Components: /dev/wd0a /dev/wd1a
 raid0: Total Sectors: 312581632 (152627 MB)
 boot device: raid0
 root on raid0a dumps on raid0b
 mountroot: trying smbfs...
 mountroot: trying msdos...
 mountroot: trying cd9660...
 mountroot: trying nfs...
 mountroot: trying lfs...
 mountroot: trying ext2fs...
 mountroot: trying ffs...
 root file system type: ffs
 cpu1: CPU 1 running
 init: copying out path `/sbin/init' 11
 mag 0 21:1
 mag 1 2e:2
 mag 2 72:3
 mag 3 65:4
 mag 4 73:5
 mag 5 65:6
 mag 6 74:7
 mag 7 2d:8
 mag 8 78:7f
 wsdisplay0: screen 1 added (80x25, vt100 emulation)
 wsdisplay0: screen 2 added (80x25, vt100 emulation)
 wsdisplay0: screen 3 added (80x25, vt100 emulation)
 wsdisplay0: screen 4 added (80x25, vt100 emulation)
 
 
 # df
 Filesystem  1K-blocks      Used     Avail Capacity  Mounted on
 /dev/raid0a   3810588   1466130   2153930    40%    /
 /dev/raid0f   4129638   1130268   2792890    28%    /var
 /dev/raid0g  65381338   7480986  54631286    12%    /u1
 /dev/raid0e   2064990      9928   1951814     0%    /lhome
 /dev/raid0h  76927856   3612282  69469182     4%    /u2
 /dev/sd0a   1057093094 619642356 384596084    61%    /u5
 
 
 -- 
      aew
 
 
 --Apple-Mail-2--185714117
 content-type: application/pgp-signature; x-mac-type=70674453;
 	name=PGP.sig
 content-description: This is a digitally signed message part
 content-disposition: inline; filename=PGP.sig
 content-transfer-encoding: 7bit
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.1 (Darwin)
 
 iD8DBQFDxcgOEh/h9J/TQyERAoYCAKDe4DM5HNzpAvDMx4sCD64YZT6WoQCePGF4
 HEOJWIQWSrag0QAm3rAWZrc=
 =Vjfh
 -----END PGP SIGNATURE-----
 
 --Apple-Mail-2--185714117--