Subject: what P-Pro or P-II motherboards support full ECC?
To: NetBSD/i386 Discussion List <port-i386@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-i386
Date: 06/09/2002 11:49:56
OK, either my P-Pro box has bad memory, a bad/broken chipset, or there's
something bad happening in the kernel.

The motherboard is an Elite Group Computer Systems P6FX1-A.  It's the
one I've spoken of before.

Very occasionally I get an unexplained NMI, usually very shortly after a
reboot:

[Sun Jun  9 01:19:11 2002]NMI ... going to debugger
[Sun Jun  9 01:19:11 2002]^MStopped in pid 9186 (cc1plus) at        memcpy+0x1a:    repe movsl      (%esi),%es:(%edi
[Sun Jun  9 01:19:11 2002]^M)
[Sun Jun  9 01:19:11 2002]^Mdb> cont

Continuing keeps things running, though perhaps there's some error in
whatever was being compiled at the time....

When running big jobs, such as building Mozilla, lots of things dump
core, but never predictably.  Usually it's the compiler, but sometimes
it's been perl.

This machine has proper 36-bit SIMMs.  The BIOS has ECC support enabled.
No memory errors are detected by the BIOS on start-up, nor by m

Heat may be a factor (perl, running from netsaint, was dumping core
every time the day before I installed the A.C. in the computer room.

However the machine is well cooled.  It's in a 4U rack chassis, with an
extra fan, and the cabinet has an exhaust fan pulling air out at about
500CFM.  Exhaust air from the cabinet is definitely under 90°F.

I need to get rid of this problem.  It's making me worry that software I
build using it may be corrupted (though the last version of Mozilla I
built on it is apparently running fine).

I need to replace the motherboard with something guaranteed to support
full ECC detection and reporting, and to do so as cheaply as possible.
Recommendations are weclome.  I'd prefer to save the CPU, and any of the
memory that is working, and of course it needs to support the same disk
drives.  Unfortunately it needs to remain an Intel box, and the
replacement must be in the ATX form factor and compatible with the
existing ATX power supply (generic ATX PC supply).


Here, FYI, is the boot output from that P-Pro box....

NetBSD 1.5W (STARTING-OUT) #0: Mon Apr 15 17:11:15 EDT 2002
    woods@proven:/work/woods/NetBSD-src/sys/arch/i386/compile/STARTING-OUT
cpu0: Intel Pentium Pro (686-class), 199.35 MHz
cpu0: I-cache 8 KB 32b/line 4-way, D-cache 8 KB 32b/line 2-way
cpu0: L2 cache 256 KB 32b/line 4-way
cpu0: features f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR>
cpu0: features f9ff<PGE,MCA,CMOV>
total memory = 81532 KB
avail memory = 73172 KB
using 1000 buffers containing 4176 KB of memory
BIOS32 rev. 0 found at 0xfb3a0
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82441FX PCI and Memory Controller (PMC) (rev. 0x02)
pcib0 at pci0 dev 7 function 0
pcib0: Intel 82371SB PCI-to-ISA Bridge (PIIX3) (rev. 0x01)
pciide0 at pci0 dev 7 function 1: Intel 82371SB IDE Interface (PIIX3) (rev. 0x00)
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <QUANTUM SIROCCO1700A>
wd0: drive supports 8-sector PIO transfers, LBA addressing
wd0: 1628 MB, 3309 cyl, 16 head, 63 sec, 512 bytes/sect x 3335472 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2
pciide0: primary channel interrupting at irq 14
wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
pciide0: secondary channel wired to compatibility mode
atapibus0 at pciide0 channel 1: 2 targets
cd0 at atapibus0 drive 1: <NEC                 CD-ROM DRIVE:284, , 3.03> type 5 cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 3, DMA mode 1
wd1 at pciide0 channel 1 drive 0: <SAMSUNG WA32163A>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 2062 MB, 4190 cyl, 16 head, 63 sec, 512 bytes/sect x 4223520 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2
pciide0: secondary channel interrupting at irq 15
wd1(pciide0:1:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
cd0(pciide0:1:1): using PIO mode 0, DMA mode 1 (using DMA data transfers)
vga0 at pci0 dev 9 function 0: S3 Trio32/64 (rev. 0x00)
wsdisplay0 at vga0
de0 at pci0 dev 11 function 0
de0: interrupting at irq 10
de0: SMC 21041 [10Mb/s] pass 1.1
de0: address 00:00:c0:83:c3:e9
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbdprobe: reset error 5
pmsprobe: reset error 5
pmsiprobe: reset error 5
lpt0 at isa0 port 0x378-0x37b irq 7
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
isapnp0: no ISA Plug 'n Play devices found
apm0 at mainbus0: Power Management spec V1.2 (slowidle)
apm: 0 batteries, global standby, global suspend, rtimer standby, rtimer suspend, internal standby, internal suspend
APM power mgmt engage (device 1): power management disabled (0x10f)
apm0: A/C state: on
apm0: battery charge state: no battery
biomask fb67 netmask ff67 ttymask ffe7
de0: enabling 10baseT port
pciide0:1:0: lost interrupt
        type: ata tc_bcount: 512 tc_skip: 0
pciide0:1:0: bus-master DMA error: missing interrupt, status=0x60
wd1: transfer error, downgrading to PIO mode 4
wd1(pciide0:1:0): using PIO mode 4
cd0(pciide0:1:1): using PIO mode 0, DMA mode 1 (using DMA data transfers)
wd1d: DMA error reading fsbn 0 (wd1 bn 0; cn 0 tn 0 sn 0), retrying
wd1: soft error (corrected)
boot device: wd0
root on wd0a dumps on wd0b
init: copying out path `/sbin/init' 11
de0: setting full duplex.
de0: enabling Full Duplex 10baseT port
wsdisplay0: screen 0 added (80x25, vt100 emulation)
wsdisplay0: screen 1 added (80x50, vt100 emulation)
wsdisplay0: screen 2 added (80x50, vt100 emulation)
wsdisplay0: screen 3 added (80x50, vt100 emulation)
wsdisplay0: screen 4 added (80x50, vt100 emulation)
wsdisplay0: screen 5 added (80x50, vt100 emulation)
wsdisplay0: screen 6 added (80x50, vt100 emulation)
wsdisplay0: screen 7 added (80x50, vt100 emulation)
wsmux1: connecting to wsdisplay0

-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>