Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

NetBSD-current on amd64 with Dell PERC 4e/Di hangs under load



I've been giving my main server more to do lately, and the problems with
it hanging have gotten worse.  It's not hardware, I think - I've swapped
out the whole computer once, and the RAID controller specifically once.

The system occasionally hangs itself up when doing heavy disk I/O, but
is much more prone to do it if it's handling much network traffic (it's
a firewall/router).  It's running current as of sometime this weekend.

The hangs can be anywhere from ten or fifteen seconds up to several
minutes, and since this latest upgrade (which is just the kernel being
upgraded from 7.99.2 to 7.99.4, since I've given up on building the
whole system the way it's behaving now), I've broken into the debugger a
couple of times while it was hung, to see what it's doing.  If someone
would take a look at the backtraces below, and see if they see something
suspicious there, I'd appreciate it!

Two sets of backtraces (simplified; hand copied) follow, and a full
dmesg from the bootstrap is at the end.

*** First hang:

db{1}> machine cpu 0
db{1}> bt

bus_space_read_2()
uhci_intr()
Xintr_ioapic_level4()
--- interrupt ---
wm_intr()
intr_biglock_wrapper()
Xintr_ioapic_level2()
--- interrupt ---
Xspllower()
DDB lost frame
Xsoftintr()

db{1}> machine cpu 1
db{1}> bt

_kernel_lock()
tcp_rcvd_wrapper()
soreceive()
dofileread()
sys_read()
syscall()
--- syscall (number 3) ---

db{1}> machine cpu 2
db{1}> bt

printf()
kdb_trap()
trap()
--- trap (number 9) ---
x86_pause()
bdev_strategy()
spec_strategy()
VOP_STRATEGY()
bio_doread.isra.4()
brea()
ffs_loadvnode()
vcache_get()
ufs_lookup()
VOP_LOOKUP()
lookup_once()
namei_tryemulroot()
namei()
fd_nameiat.isra.0()
do_sys_statat()
sys___stat50()
syscall()
--- syscall (number 439) ---

db{1}> machine cpu 3
db{1}> bt

printf()
kdb_trap()
trap()
--- trap (number 9) ---
x86_pause()
kevent1()
sys___kevent50()
syscall()
--- syscall (number 435) ---

*** Second hang:

db{3}> machine cpu 0
db{3}> bt

Xintr_ioapic_level6()
--- interrupt ---

db{3}> machine cpu 1
db{3}> bt

printf()
kdb_trap()
trap()
--- trap (number 9) ---
x86_pause()
frag6_fasttimo()
pffasttimo()
callout_softclock()
softint_dispatch()
DDB lost frame
Xsoftintr()

db{3}> machine cpu 2
db{3}> bt

printf()
kdb_trap()
trap()
--- trap (number 9) ---
x86_pause()
bdev_strategy()
spec_strategy()
VOP_STRATEGY()
genfs_getpages()
VOP_GETPAGES()
ra_startio()
uvm_ra_request()
uvn_get()
ubc_fault()
uvm_fault_internal()
trap()
--- trap (number 6) ---
copyout()
uiomove()
ubc_uiomove()
ffs_read()
VOP_READ()
vn_read()
dofileread()
sys_read()
syscall()
--- syscall (number 3) ---

db{3}> machine cpu 3
db{3}> bt

_kernel_lock()
sleepq_block()
cv_timedwait()
ipmi_thread()

*** dmesg output from boot:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 7.99.4 (BARSOOM) #41: Sun Jan 25 21:11:03 CET 2015
	root%barsoom.hamartun.priv.no@localhost:/usr/obj/sys/arch/amd64/compile.amd64/BARSOOM
total memory = 8191 MB
avail memory = 7934 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
Dell Computer Corporation PowerEdge 2850
mainbus0 (root)
ACPI: RSDP 0x00000000000FD5B0 000014 (v00 DELL  )
ACPI: RSDT 0x00000000000FD5C4 000038 (v01 DELL   PE BKC   00000001 MSFT 0100000A)
ACPI: FACP 0x00000000000FD620 000074 (v01 DELL   PE BKC   00000001 MSFT 0100000A)
ACPI: DSDT 0x00000000BFFC0000 003CCD (v01 DELL   PE BKC   00000001 MSFT 0100000E)
ACPI: FACS 0x00000000BFFCFC00 000040
ACPI: APIC 0x00000000000FD694 0000E0 (v01 DELL   PE BKC   00000001 MSFT 0100000A)
ACPI: SPCR 0x00000000000FD774 000050 (v01 DELL   PE BKC   00000001 MSFT 0100000A)
ACPI: HPET 0x00000000000FD7C4 000038 (v01 DELL   PE BKC   00000001 MSFT 0100000A)
ACPI: MCFG 0x00000000000FD7FC 00003C (v01 DELL   PE BKC   00000001 MSFT 0100000A)
ACPI: All ACPI Tables successfully acquired
cpu0 at mainbus0 apid 0
cpu0: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu1 at mainbus0 apid 6
cpu1: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu2 at mainbus0 apid 1
cpu2: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu3 at mainbus0 apid 7
cpu3: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
ioapic0 at mainbus0 apid 8: pa 0xfec00000, version 0x20, 24 pins
ioapic1 at mainbus0 apid 9: pa 0xfec80000, version 0x20, 24 pins
ioapic2 at mainbus0 apid 10: pa 0xfec83000, version 0x20, 24 pins
ioapic3 at mainbus0 apid 11: pa 0xfec84000, version 0x20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20140926
acpi0: X/RSDT: OemId <DELL  ,PE BKC  ,00000001>, AslId <MSFT,0100000a>
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
pcppi1 at acpi0 (SPK, PNP0800): io 0x61
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x5f irq 0
FDC (PNP0700) at acpi0 not configured
COMA (PNP0501) at acpi0 not configured
MBIO (PNP0C01) at acpi0 not configured
NIPM (IPI0001) at acpi0 not configured
acpivga0 at acpi0 (EVGA): ACPI Display Adapter
PEHB (PNP0C02) at acpi0 not configured
ACPI: Enabled 1 GPEs in block 00 to 1F
ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20140926/hwxface-646)
ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20140926/hwxface-646)
ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S3_] (20140926/hwxface-646)
attimer1: attached to pcppi1
ipmi0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 8086 product 3590 (rev. 0x09)
ppb0 at pci0 dev 2 function 0: vendor 8086 product 3595 (rev. 0x09)
ppb0: PCI Express capability version 1 <Root Port of PCI-E Root Complex> x8 @ 2.5GT/s
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
ppb1 at pci1 dev 0 function 0: vendor 8086 product 0330 (rev. 0x06)
ppb1: PCI Express capability version 1 <PCI-E to PCI/PCI-X Bridge>
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
amr0 at pci2 dev 14 function 0: AMI RAID <PERC 4e/Di>
amr0: interrupting at ioapic1 pin 14
amr0: firmware 5B2D, BIOS H435, 256MB RAM
ld0 at amr0 unit 0: RAID 1, optimal
ld0: 69880 MB, 8908 cyl, 255 head, 63 sec, 512 bytes/sect x 143114240 sectors
ld1 at amr0 unit 1: RAID 1, optimal
ld1: 69880 MB, 8908 cyl, 255 head, 63 sec, 512 bytes/sect x 143114240 sectors
ld2 at amr0 unit 2: RAID 1, optimal
ld2: 136 GB, 17834 cyl, 255 head, 63 sec, 512 bytes/sect x 286515200 sectors
ppb2 at pci1 dev 0 function 2: vendor 8086 product 0332 (rev. 0x06)
ppb2: PCI Express capability version 1 <PCI-E to PCI/PCI-X Bridge>
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
ppb3 at pci0 dev 4 function 0: vendor 8086 product 3597 (rev. 0x09)
ppb3: PCI Express capability version 1 <Root Port of PCI-E Root Complex> x8 @ 2.5GT/s
pci4 at ppb3 bus 4
pci4: i/o space, memory space enabled, rd/line, wr/inv ok
ppb4 at pci0 dev 5 function 0: vendor 8086 product 3598 (rev. 0x09)
ppb4: PCI Express capability version 1 <Root Port of PCI-E Root Complex> x4 @ 2.5GT/s
pci5 at ppb4 bus 5
pci5: i/o space, memory space enabled, rd/line, wr/inv ok
ppb5 at pci5 dev 0 function 0: vendor 8086 product 0329 (rev. 0x09)
ppb5: PCI Express capability version 1 <PCI-E to PCI/PCI-X Bridge>
pci6 at ppb5 bus 6
pci6: i/o space, memory space enabled, rd/line, wr/inv ok
wm0 at pci6 dev 7 function 0: Intel i82541GI 1000BASE-T Ethernet (rev. 0x05)
wm0: interrupting at ioapic2 pin 0
wm0: 32-bit 66MHz PCI bus
wm0: EEPROM failed to become ready
wm0: 64 words (16 address bits) SPI EEPROM
wm0: Ethernet address 00:13:72:f7:00:06
igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb6 at pci5 dev 0 function 2: vendor 8086 product 032a (rev. 0x09)
ppb6: PCI Express capability version 1 <PCI-E to PCI/PCI-X Bridge>
pci7 at ppb6 bus 7
pci7: i/o space, memory space enabled, rd/line, wr/inv ok
wm1 at pci7 dev 8 function 0: Intel i82541GI 1000BASE-T Ethernet (rev. 0x05)
wm1: interrupting at ioapic2 pin 1
wm1: 32-bit 66MHz PCI bus
wm1: EEPROM failed to become ready
wm1: 64 words (16 address bits) SPI EEPROM
wm1: Ethernet address 00:13:72:f7:00:07
igphy1 at wm1 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb7 at pci0 dev 6 function 0: vendor 8086 product 3599 (rev. 0x09)
ppb7: PCI Express capability version 1 <Root Port of PCI-E Root Complex> x8 @ 2.5GT/s
pci8 at ppb7 bus 8
pci8: i/o space, memory space enabled, rd/line, wr/inv ok
ppb8 at pci8 dev 0 function 0: vendor 8086 product 0329 (rev. 0x09)
ppb8: PCI Express capability version 1 <PCI-E to PCI/PCI-X Bridge>
pci9 at ppb8 bus 9
pci9: i/o space, memory space enabled, rd/line, wr/inv ok
ppb9 at pci8 dev 0 function 2: vendor 8086 product 032a (rev. 0x09)
ppb9: PCI Express capability version 1 <PCI-E to PCI/PCI-X Bridge>
pci10 at ppb9 bus 10
pci10: i/o space, memory space enabled, rd/line, wr/inv ok
uhci0 at pci0 dev 29 function 0: vendor 8086 product 24d2 (rev. 0x02)
uhci0: interrupting at ioapic0 pin 16
usb0 at uhci0: USB revision 1.0
uhci1 at pci0 dev 29 function 1: vendor 8086 product 24d4 (rev. 0x02)
uhci1: interrupting at ioapic0 pin 19
usb1 at uhci1: USB revision 1.0
uhci2 at pci0 dev 29 function 2: vendor 8086 product 24d7 (rev. 0x02)
uhci2: interrupting at ioapic0 pin 18
usb2 at uhci2: USB revision 1.0
ehci0 at pci0 dev 29 function 7: vendor 8086 product 24dd (rev. 0x02)
ehci0: interrupting at ioapic0 pin 23
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2
usb3 at ehci0: USB revision 2.0
ppb10 at pci0 dev 30 function 0: vendor 8086 product 244e (rev. 0xc2)
pci11 at ppb10 bus 11
pci11: i/o space, memory space enabled
vendor 1028 product 0011 (undefined, subclass 0x00) at pci11 dev 5 function 0 not configured
vendor 1028 product 0012 (undefined, subclass 0x00) at pci11 dev 5 function 1 not configured
vendor 1028 product 0014 (undefined, subclass 0x00) at pci11 dev 5 function 2 not configured
cmdide0 at pci11 dev 6 function 0: Silicon Image 0680 (rev. 0x02)
cmdide0: bus-master DMA support present
cmdide0: primary channel wired to native-PCI mode
cmdide0: using ioapic0 pin 23 for native-PCI interrupt
atabus0 at cmdide0 channel 0
cmdide0: secondary channel wired to native-PCI mode
atabus1 at cmdide0 channel 1
radeon0 at pci11 dev 13 function 0: vendor 1002 product 5159 (rev. 0x00)
ichlpcib0 at pci0 dev 31 function 0: vendor 8086 product 24d0 (rev. 0x02)
timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
ichlpcib0: 24-bit timer
ichlpcib0: TCO (watchdog) timer configured.
gpio0 at ichlpcib0: 64 pins
piixide0 at pci0 dev 31 function 1: Intel 82801EB IDE Controller (ICH5) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14
atabus2 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15
atabus3 at piixide0 channel 1
isa0 at ichlpcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
pckbc0 at isa0 port 0x60-0x64
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
acpicpu0 at cpu0: ACPI CPU
acpicpu0: C1: HLT, lat   0 us, pow     0 mW
acpicpu1 at cpu1: ACPI CPU
acpicpu2 at cpu2: ACPI CPU
acpicpu3 at cpu3: ACPI CPU
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
uhub0 at usb0: vendor 8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhub1 at usb1: vendor 8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhub2 at usb3: vendor 8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 6 ports with 6 removable, self powered
atapibus0 at atabus0: 2 targets
uhub3 at usb2: vendor 8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
sd0 at atapibus0 drive 0: <VIRTUALFLOPPY DRIVE               Flopp, , > disk removable
IPsec: Initialized Security Association Processing.
sd0: drive offline
sd0: 32-bit data port
sd0: drive supports PIO mode 3
cd0 at atapibus0 drive 1: <VIRTUALCDROM DRIVE, , > cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 3
sd0(cmdide0:0:0): using PIO mode 3
cd0(cmdide0:0:1): using PIO mode 3
atapibus1 at atabus2: 2 targets
cd1 at atapibus1 drive 0: <HL-DT-ST  GCR-8240N, , 1.10> cdrom removable
cd1: 32-bit data port
cd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd1(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
ehci0: handing over full speed device on port 1 to uhci0
uhub4 at uhub2 port 3: vendor 413c product a001, class 9/0, rev 2.00/0.00, addr 2
uhub4: multiple transaction translators
uhub4: 2 ports with 2 removable, self powered
uhidev0 at uhub0 port 1 configuration 1 interface 0
uhidev0: Dell DRAC4, rev 1.10/0.00, addr 2, iclass 3/1
ukbd0 at uhidev0: 8 modifier keys, 6 key codes
ehci0: handing over full speed device on port 5 to uhci2
Kernelized RAIDframe activated
pad0: outputs: 44100Hz, 16-bit, stereo
audio0 at pad0wskbd0 at ukbd0: half duplex mux 1, playback
, captureuhidev1 at uhub0 port 1 configuration 1
 interface 1
boot device: ld0
root on ld0a dumps on ld0b
root file system type: ffs
uhidev1: Dell DRAC4, rev 1.10/0.00, addr 2, iclass 3/1
kern.module.path=/stand/amd64/7.99.4/modules
ums0 at uhidev1drm: initializing kernel modesetting (RV100 0x1002:0x5159 0x1028:0x016D).
: 3 buttons and Z dir
drm: register mmio base: 0xdf4e0000
wsmouse0 at ums0drm: register mmio size: 65536
 mux 0
radeon0: info: VRAM: 128M 0x00000000C8000000 - 0x00000000CFFFFFFF (16M used)
radeon0: info: GTT: 512M 0x00000000A8000000 - 0x00000000C7FFFFFF
drm: Detected VRAM RAM=80M, BAR=128M
drm: RAM width 32bits DDR
Zone  kernel: Available graphics memory: 2877786 kiB
Zone   dma32: Available graphics memory: 2097152 kiB
drm: radeon: 16M of VRAM memory ready
drm: radeon: 512M of GTT memory ready.
drm: GART: num cpu pages 131072, num gpu pages 131072
drm: PCI GART of 512M enabled (table at 0x0000000040A06000).
radeon0: info: WB disabled
radeon0: info: fence driver on ring 0 use gpu addr 0x00000000a8000000 and cpu addr 0x0xffff800090f89000
drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
radeon0: interrupting at ioapic0 pin 18 (radeon)
drm: radeon: irq initialized.
drm: Loading R100 Microcode
drm kern error: radeon_cp: Failed to load firmware "radeon/R100_cp.bin"
DRM error in r100_cp_init: Failed to load firmware!
radeon0: error: failed initializing CP (-2).
radeon0: error: Disabling GPU acceleration
drm: radeon: cp finalized
drm: No TV DAC info found in BIOS
drm: Radeon Display Connectors
drm: Connector 0:
drm:   VGA-1
drm:   DDC: 0x60 0x60 0x60 0x60 0x60 0x60 0x60 0x60
drm:   Encoders:
drm:     CRT1: INTERNAL_DAC1
drm: Connector 1:
drm:   VGA-2
drm:   DDC: 0x6c 0x6c 0x6c 0x6c 0x6c 0x6c 0x6c 0x6c
drm:   Encoders:
drm:     CRT2: INTERNAL_DAC2
drm: Connector 2:
drm:   DVI-I-1
drm:   HPD1
drm:   DDC: 0x64 0x64 0x64 0x64 0x64 0x64 0x64 0x64
drm:   Encoders:
drm:     CRT2: INTERNAL_DAC2
drm:     DFP1: INTERNAL_TMDS1
radeondrmkmsfb0 at radeon0
radeon0: info: registered panic notifier
radeondrmkmsfb0: framebuffer at 0xffff800090e87000, size 1024x768, depth 8, stride 1024
wsdisplay0 at radeondrmkmsfb0 kbdmux 1
wsmux1: connecting to wsdisplay0
wskbd0: connecting to wsdisplay0
uplcom0 at uhub3 port 1
uplcom0: vendor 0557 product 2008, rev 1.10/0.01, addr 2
ucom0 at uplcom0
/var: replaying log to disk
ipmi0: version 1.5 interface KCS iobase 0xca8/8 spacing 4
/usr: replaying log to disk
wsdisplay0: screen 1 added (default, vt100 emulation)
wsdisplay0: screen 2 added (default, vt100 emulation)
wsdisplay0: screen 3 added (default, vt100 emulation)
wsdisplay0: screen 4 added (default, vt100 emulation)
/u: replaying log to disk
/var/pgsql/data: replaying log to disk
/usr/local: replaying log to disk

-tih
-- 
Popularity is the hallmark of mediocrity.  --Niles Crane, "Frasier"


Home | Main Index | Thread Index | Old Index