Subject: Re: hard lockup on dual Opteron running either amd64 or i386
To: Chris Jones <chris@cjones.org>
From: George Georgalis <george@galis.org>
List: netbsd-users
Date: 08/03/2007 23:02:56
On Fri, Aug 03, 2007 at 06:07:07PM -0600, Chris Jones wrote:
>Hi folks--
>
>I recently bought a new server, which is a Dell PowerEdge SC 1435. It 
>has a dual core Opteron. Running this machine with an MP kernel, either 
>NetBSD/i386 or NetBSD/amd64, results in a hard lockup after anywhere 
>from a few seconds to a few minutes. The lockup seems to be correlated 
>with high disk activity. Once it locks up, it doesn't respond to the 
>keyboard, including the LEDs for Caps Lock, etc. As a result, I can't 
>tell you what ddb says about the system state. Please see the dmesg 
>output at the end of this message for hardware info.
>
>Any thoughts on what I should try with this computer? I'd really like to 
>be able to use both CPUs.
>
>Chris
>

Well this sounds very much like my problem, I do get more life
from the system, though its days are numbered.

I filed under security because I thought it had to do with a
particular tar.bz2 archive causing a DOS condition.

security/36712: tar extraction cause cannot create pipe, too many open files
http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=36712

I'm finding inability to create pipes after disk use. My system is
also a multi-core Opteron, but I've only experienced the problem
after lots of disk access, using a LSI fibre HBA attached to an
external disk enclosure.

// George


>NetBSD 3.1 (GENERIC) #0: Mon Oct 30 21:47:28 UTC 2006
>	builds@b1.netbsd.org:/home/builds/ab/netbsd-3-1-RELEASE/amd64/200610302053Z-obj/home/builds/ab/netbsd-3-1-RELEASE/src/sys/arch/amd64/compile/GENERIC
>total memory = 1022 MB
>avail memory = 969 MB
>mainbus0 (root)
>mainbus0: Intel MP Specification (Version 1.4) (DELL     PE 01EB     )
>cpu0 at mainbus0: apid 0 (boot processor)
>cpu0: Dual-Core AMD Opteron(tm) Processor 2212, 2000.19 MHz
>cpu0: features: ffdbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
>cpu0: features: ffdbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
>cpu0: features: ffdbfbff<FXSR,SSE,SSE2,B27,B28,LONG,3DNOW2,3DNOW>
>cpu0: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
>cpu0: L2 cache 1 MB 64B/line 16-way
>cpu0: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
>cpu0: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
>cpu0: calibrating local timer
>cpu0: apic clock running at 200 MHz
>cpu0: 16 page colors
>cpu1 at mainbus0: apid 1 (application processor)
>cpu1: not started
>mpbios: bus 0 is type PCI   
>mpbios: bus 1 is type PCI   
>mpbios: bus 2 is type PCI   
>mpbios: bus 3 is type PCI   
>mpbios: bus 4 is type PCI   
>mpbios: bus 5 is type PCI   
>mpbios: bus 6 is type PCI   
>mpbios: bus 7 is type PCI   
>mpbios: bus 8 is type ISA   
>ioapic0 at mainbus0 apid 2 (I/O APIC)
>ioapic0: pa 0xfec00000, version 11, 16 pins
>ioapic0: misconfigured as apic 0
>ioapic0: remapped to apic 2
>ioapic1 at mainbus0 apid 3 (I/O APIC)
>ioapic1: pa 0xfec01000, version 11, 16 pins
>ioapic1: misconfigured as apic 0
>ioapic1: remapped to apic 3
>pci0 at mainbus0 bus 0: configuration mode 1
>pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
>ppb0 at pci0 dev 1 function 0: ServerWorks product 0x0036 (rev. 0x00)
>pci1 at ppb0 bus 3
>pci1: i/o space, memory space enabled
>ppb1 at pci1 dev 13 function 0: ServerWorks product 0x0104 (rev. 0xc0)
>pci2 at ppb1 bus 4
>pci2: i/o space, memory space enabled
>pciide0 at pci1 dev 14 function 0
>pciide0: ServerWorks product 0x024b (rev. 0x00)
>pciide0: bus-master DMA support present, but unused (no driver support)
>pciide0: primary channel configured to native-PCI mode
>pciide0: using ioapic0 pin 6 (irq 6) for native-PCI interrupt
>atabus0 at pciide0 channel 0
>pciide0: secondary channel configured to native-PCI mode
>atabus1 at pciide0 channel 1
>pchb0 at pci0 dev 2 function 0
>pchb0: ServerWorks product 0x0205 (rev. 0x00)
>pcib0 at pci0 dev 2 function 2
>pcib0: ServerWorks product 0x0234 (rev. 0x00)
>ohci0 at pci0 dev 3 function 0: ServerWorks product 0x0223 (rev. 0x01)
>ohci0: interrupting at ioapic0 pin 11 (irq 11)
>ohci0: OHCI version 1.0, legacy support
>usb0 at ohci0: USB revision 1.0
>uhub0 at usb0
>uhub0: ServerWorks OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
>uhub0: 2 ports with 2 removable, self powered
>ohci1 at pci0 dev 3 function 1: ServerWorks product 0x0223 (rev. 0x01)
>ohci1: interrupting at ioapic0 pin 11 (irq 11)
>ohci1: OHCI version 1.0, legacy support
>usb1 at ohci1: USB revision 1.0
>uhub1 at usb1
>uhub1: ServerWorks OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
>uhub1: 2 ports with 2 removable, self powered
>ServerWorks product 0x0223 (USB serial bus, interface 0x20, revision 0x01) at pci0 dev 3 function 2 not configured
>vga0 at pci0 dev 4 function 0: ATI Technologies product 0x515e (rev. 0x02)
>wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
>wsmux1: connecting to wsdisplay0
>ppb2 at pci0 dev 7 function 0: ServerWorks product 0x0140 (rev. 0xa2)
>pci3 at ppb2 bus 5
>pci3: i/o space, memory space enabled, rd/line, wr/inv ok
>ppb3 at pci0 dev 8 function 0: ServerWorks product 0x0142 (rev. 0xa2)
>pci4 at ppb3 bus 1
>pci4: i/o space, memory space enabled, rd/line, wr/inv ok
>bge0 at pci4 dev 0 function 0: Broadcom BCM5721 Gigabit Ethernet
>bge0: interrupting at ioapic1 pin 1 (irq 5)
>bge0: PCI-Express DMA setting 0x76180000, expected 0x76180000
>bge0: ASIC unknown BCM575x family (0x4201), Ethernet address 00:1a:a0:33:24:27
>bge0: setting short Tx thresholds
>brgphy0 at bge0 phy 1: BCM5750 1000BASE-T media interface, rev. 0
>brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
>ppb4 at pci0 dev 9 function 0: ServerWorks product 0x0144 (rev. 0xa2)
>pci5 at ppb4 bus 2
>pci5: i/o space, memory space enabled, rd/line, wr/inv ok
>bge1 at pci5 dev 0 function 0: Broadcom BCM5721 Gigabit Ethernet
>bge1: interrupting at ioapic1 pin 5 (irq 10)
>bge1: PCI-Express DMA setting 0x76180000, expected 0x76180000
>bge1: ASIC unknown BCM575x family (0x4201), Ethernet address 00:1a:a0:33:24:28
>bge1: setting short Tx thresholds
>brgphy1 at bge1 phy 1: BCM5750 1000BASE-T media interface, rev. 0
>brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
>ppb5 at pci0 dev 10 function 0: ServerWorks product 0x0142 (rev. 0xa2)
>pci6 at ppb5 bus 6
>pci6: i/o space, memory space enabled, rd/line, wr/inv ok
>ppb6 at pci0 dev 11 function 0: ServerWorks product 0x0144 (rev. 0xa2)
>pci7 at ppb6 bus 7
>pci7: i/o space, memory space enabled, rd/line, wr/inv ok
>pchb1 at pci0 dev 24 function 0
>pchb1: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
>pchb2 at pci0 dev 24 function 1
>pchb2: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
>pchb3 at pci0 dev 24 function 2
>pchb3: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
>pchb4 at pci0 dev 24 function 3
>pchb4: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
>isa0 at pcib0
>com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
>com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
>pckbc0 at isa0 port 0x60-0x64
>kbc: cmd word write error
>pcppi0 at isa0 port 0x61
>midi0 at pcppi0: PC speaker
>sysbeep0 at pcppi0
>ioapic0: enabling
>ioapic1: enabling
>Kernelized RAIDframe activated
>uhub2 at uhub1 port 1
>uhub2: Dell product 0xa001, class 9/0, rev 2.00/0.00, addr 2
>uhub2: 2 ports with 2 removable, self powered
>uhidev0 at uhub2 port 1 configuration 1 interface 0
>uhidev0: vendor 0x10d5 PS2 to USB, rev 1.10/0.01, addr 3, iclass 3/1
>ukbd0 at uhidev0
>wskbd0 at ukbd0 mux 1
>wskbd0: connecting to wsdisplay0
>uhidev1 at uhub2 port 1 configuration 1 interface 1
>uhidev1: vendor 0x10d5 PS2 to USB, rev 1.10/0.01, addr 3, iclass 3/1
>uhidev1: 3 report ids
>ums0 at uhidev1 reportid 1: 5 buttons and Z dir.
>wsmouse0 at ums0 mux 0
>uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
>uhid1 at uhidev1 reportid 3: input=2, output=0, feature=0
>wd0 at atabus0 drive 0: <WDC WD2500YS-18SHB1>
>wd0: drive supports 16-sector PIO transfers, LBA48 addressing
>wd0: 232 GB, 484406 cyl, 16 head, 63 sec, 512 bytes/sect x 488281250 sectors
>wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
>wd1 at atabus1 drive 0: <WDC WD2500YS-18SHB1>
>wd1: drive supports 16-sector PIO transfers, LBA48 addressing
>wd1: 232 GB, 484406 cyl, 16 head, 63 sec, 512 bytes/sect x 488281250 sectors
>wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
>boot device: bge0
>root on bge0
>nfs_boot: trying DHCP/BOOTP
>bge0: PCI-Express DMA setting 0x76180000, expected 0x76180000
>dma read modebits: set 575x tso bit: 0x 80203fe
>dma read modebits: 0x 80203fe
>nfs_boot: DHCP next-server: 10.0.0.3
>nfs_boot: my_name=evilmax
>nfs_boot: my_domain=cjones.org
>nfs_boot: my_addr=10.0.4.4
>nfs_boot: my_mask=255.255.0.0
>nfs_boot: gateway=10.0.0.254
>root on 10.0.0.3:/usr/local/tarpit/export/evilmax
>root file system type: nfs


-- 
George Georgalis, information system scientist <IXOYE><