Subject: Diagnosing dying hardware -- any suggestions?
To: None <port-i386@netbsd.org>
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
List: port-i386
Date: 10/20/2006 07:48:15
	Hello.  I have a relatively new P4 machine with 2GB of RAM which is
running in  a production environment under NetBSD-3.0_stable with sources
around mid January 2006.  Lately, it's begun panicing with uvm_faults and
illegal page faults and other spurious error messages.  I'm certain the
hardware is at fault, but now the question is, does the problem lie with
the memory sticks in the machine, 4 512MB sticks, or does it lie on the
motherboard itself.  I've tried blowing the dust out of the board, and
reseating the memory sticks, and also rearanging their order, but the
mis-behavior seems the same.
	So, what I'm wondering is if anyone can tell me, given a few samples
of the output below, if it's probable that the trouble is with the RAM or
with the board.  I'm assuming that if a given memory stick was bad, and
it's now in a different place in the physical lineup of RAM, that perhaps
the character of the faulting address would change, such that it is
possible to say that the error moved, and thus it is RAM.
Any ideas would be greatly appreciated, especially if someone can point to
something and say "this means ram, this other thing means cache chips on
board the motherborad, etc."

-thanks

-Brian


uvm_fault(0xc07a0880, 0x88c55000, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c02cd0bf cs 8 eflags 10282 cr2 88c55eb0 ilevel 0
panic: trap
syncing disks... uvm_fault(0xc07a0880, 0x88c55000, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c02cd0bf cs 8 eflags 10282 cr2 88c55eb0 ilevel 0
panic: trap


[new instance of a panic]
uvm_fault(0xc0793360, 0xca8ee000, 0, 2) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 2 eip c035365f cs 8 eflags 10286 cr2 ca8ee4fc ilevel 0
panic: trap
syncing disks... panic: lockmgr: locking against myself

[New instance]
uvm_fault(0xc0793360, 0xcc99c000, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c039ad38 cs 8 eflags 10286 cr2 cc99c130 ilevel 0
panic: trap
syncing disks... NetBSD 3.0_STABLE (SHIRE) #0: Thu Jan 19 18:16:50 PST 2006

[New instance]
uvm_fault(0xc07a0880, 0, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c03b760a cs 8 eflags 10286 cr2 7 ilevel 0
panic: trap
syncing disks... uvm_fault(0xc07a0880, 0xa63ee000, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c03b76d0 cs 8 eflags 10286 cr2 a63ee507 ilevel 0
panic: trap

[New instance]
uvm_fault(0xcecf3d28, 0x4daf4000, 0, 2) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 2 eip c039af3c cs 8 eflags 10206 cr2 4daf4660 ilevel 0
panic: trap
syncing disks... NetBSD 3.0_STABLE (SHIRE) #0: Thu Jan 19 18:16:50 PST 2006

[New instance]
uvm_fault(0xc07a0880, 0x8da5e000, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c039ad38 cs 8 eflags 10282 cr2 8da5e410 ilevel 0
panic: trap
syncing disks... NetBSD 3.0_STABLE (SHIRE) #0: Thu Jan 19 18:16:50 PST 2006

[dmesg output, in case it's relevant]

NetBSD 3.0_STABLE (SHIRE) #0: Thu Jan 19 18:16:50 PST 2006
	buhrow@lothlorien.nfbcal.org:/usr/src/sys/arch/i386/compile/SHIRE
total memory = 2030 MB
avail memory = 1980 MB
BIOS32 rev. 0 found at 0xf0010
PCI BIOS rev. 2.1 found at 0xf0031
PCI IRQ Routing Table rev. 1.0 found at 0xf3d00, size 224 bytes (12 entries)
PCI Interrupt Router at 000:31:0 (Intel product 0x8086 compatible)
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel (686-class), 2992.85 MHz, id 0xf33
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 41d<SSE3,MONITOR,DS-CPL,CID>
cpu0: "Intel(R) Pentium(R) 4 CPU 3.00GHz"
cpu0: I-cache 12K uOp cache 8-way
cpu0: L2 cache 1 MB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: using thermal monitor 1
cpu0: 32 page colors
pnpbios0 at mainbus0: code f0000, data f0000, entry 447a, control 0 eventp 0
pnpbios0: nodes 15, max len 274
PNP0C01 (mem 0-9fbff 9fc00-9ffff e6000-fffff 100000-7ef2fbff 7ef2fc00-7ef2ffff 7ef30000-7ef3ffff 7ef40000-7efeffff 7eff0000-7effffff fecf0000-fecf0fff fed20000-fed9ffff) at pnpbios0 index 0 ignored
PNP0000 (io 20-21 a0-a1, irq 2) at pnpbios0 index 1 ignored
PNP0200 (io 0-f 80-90 94-9f c0-de, DMA 4) at pnpbios0 index 2 ignored
PNP0100 (io 40-43, irq 0) at pnpbios0 index 3 ignored
PNP0B00 (io 70-71, irq 8) at pnpbios0 index 4 ignored
PNP0303 (io 60 64, irq 1) at pnpbios0 index 5 ignored
PNP0A03 at pnpbios0 index 6 disabled
PNP0800 (io 61) at pnpbios0 index 7 ignored
PNP0C04 (io f0-ff, irq 13) at pnpbios0 index 8 ignored
PNP0501 (io 3f8-3ff, irq 4) at pnpbios0 index 9 ignored
PNP0700 (io 3f0-3f5 3f7, irq 6, DMA 2) at pnpbios0 index 10 ignored
PNP0400 (io 378-37f, irq 7) at pnpbios0 index 11 ignored
PNP0C02 (io 4d0-4d1 cf8-cff 10-1f 24-2d 30-3d 50-53 72-77 91-93 a4-bd df 400-47f 500-53f 2e 2f 680-6ff) at pnpbios0 index 12 ignored
INT0800 (mem ffb80000-ffffffff) at pnpbios0 index 13 ignored
PNP0C02 at pnpbios0 index 14 disabled
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82865 Host (rev. 0x02)
pchb0: random number generator enabled
agp0 at pchb0: detected 16252k stolen memory
agp0: aperture at 0xf0000000, size 0x8000000
vga1 at pci0 dev 2 function 0: Intel 82865G Integrated Graphics Device (rev. 0x02)
wsdisplay0 at vga1 kbdmux 1
wsmux1: connecting to wsdisplay0
uhci0 at pci0 dev 29 function 0: Intel 82801EB/ER USB UHCI Controller (rev. 0x02)
uhci0: interrupting at irq 11
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1: Intel 82801EB/ER USB UHCI Controller (rev. 0x02)
uhci1: interrupting at irq 5
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2: Intel 82801EB/ER USB UHCI Controller (rev. 0x02)
uhci2: interrupting at irq 10
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3 at pci0 dev 29 function 3: Intel 82801EB/ER USB UHCI Controller (rev. 0x02)
uhci3: interrupting at irq 11
usb3 at uhci3: USB revision 1.0
uhub3 at usb3
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
ehci0 at pci0 dev 29 function 7: Intel 82801EB/ER USB EHCI Controller (rev. 0x02)
ehci0: interrupting at irq 9
ehci0: BIOS has given up ownership
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2 uhci3
usb4 at ehci0: USB revision 2.0
uhub4 at usb4
uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: single transaction translator
uhub4: 8 ports with 8 removable, self powered
ppb0 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0xc2)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
fxp0 at pci1 dev 8 function 0: Intel PRO/100 VM Network Controller with 82562ET/EZ PHY, rev 1
fxp0: interrupting at irq 9
fxp0: Ethernet address 00:11:11:3f:46:56
inphy0 at fxp0 phy 1: i82562ET 10/100 media interface, rev. 0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ichlpcib0 at pci0 dev 31 function 0
ichlpcib0: Intel 82801EB LPC Interface Bridge (rev. 0x02)
ichlpcib0: TCO (watchdog) timer configured.
ichlpcib0: No SpeedStep
piixide0 at pci0 dev 31 function 1
piixide0: Intel 82801EB IDE Controller (ICH5) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
piixide0: primary channel interrupting at irq 14
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
piixide0: secondary channel interrupting at irq 15
atabus1 at piixide0 channel 1
piixide1 at pci0 dev 31 function 2
piixide1: Intel 82801EB Serial ATA Controller (rev. 0x02)
piixide1: bus-master DMA support present
piixide1: primary channel configured to native-PCI mode
piixide1: using irq 10 for native-PCI interrupt
atabus2 at piixide1 channel 0
piixide1: secondary channel configured to native-PCI mode
atabus3 at piixide1 channel 1
Intel 82801EB/ER SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev 31 function 3 not configured
auich0 at pci0 dev 31 function 5: i82801EB (ICH5) AC-97 Audio
auich0: interrupting at irq 3
auich0: ac97: Analog Devices AD1985 codec; headphone, 20 bit DAC, no 3D stereo
auich0: ac97: ext id 3c7<AMAP,LDAC,SDAC,CDAC,SPDIF,DRA,VRA>
isa0 at ichlpcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
pckbc0 at isa0 port 0x60-0x64
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
spkr0 at pcppi0
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
isapnp0: no ISA Plug 'n Play devices found
apm0 at mainbus0: Power Management spec V1.2
auich0: measured ac97 link rate at 48003 Hz, will use 48000 Hz
audio0 at auich0: full duplex, mmap, independent
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
wd0 at atabus0 drive 0: <WDC WD2500JB-00GVA0>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
wd1 at atabus1 drive 0: <WDC WD2500JB-00FUA0>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(piixide0:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
raid0: RAID Level 1
raid0: Components: /dev/wd0e /dev/wd1e
raid0: Total Sectors: 482348916 (235521 MB)
boot device: raid0
root on raid0a dumps on raid0b
root file system type: ffs
raid0: Device already configured!
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
Accounting started