Subject: Supermicro Motherboard IDE Errors
To: None <tech-kern@netbsd.org>
From: Curt Sampson <cjs@cynic.net>
List: tech-kern
Date: 07/18/2003 18:53:49
So we've got some Supermicro 1U boxen that don't seem to get along with
NetBSD 1.6.1, disk-wise. Basically, when you're doing a fair amount of
disk I/O, you start to get errors reading the disks. This has happened
on three different machines with four different pairs of drives (the
pairs are mirrored using RAIDFrame) and cables, configured both as one
master drive on each IDE channel, and as a master/slave pair on one
channel. Does anybody have any thoughts on what might be going wrong,
and how to fix it?

Here are the details:

Supermicro 1 U servers - SuperServer 5012B-6 (SYS-5012-B6) -
http://www.supermicro.com/PRODUCT/SUPERServer/SuperServer5012B-6.htm

based on the - SUPER P4SBR (MBD-P4SBR-O) - motherboard
http://www.supermicro.com/PRODUCT/MotherBoards/845/P4SBR.htm)

They have the following specs:
     - Intel 845 chipset
     - 400MHz FSB
     - P4 2GHz CPU
     - 1 GB memory - 2 x 512
     - 2 x 80 GB IBM IC35L080AVVA07-0
     - Onboar Intel 82801BA IDE Controller

pcib0 at pci0 dev 31 function 0
pcib0: Intel 82801BA LPC Interface Bridge (rev. 0x05)
pciide0 at pci0 dev 31 function 1: Intel 82801BA IDE Controller (ICH2)
(rev. 0x05)
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <IC35L080AVVA07-0>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 78533 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 160836480
sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1 at pciide0 channel 0 drive 1: <IC35L090AVV207-0>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 78533 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 160836480
sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
pciide0: primary channel interrupting at irq 14wd0(pciide0:0:0): using
PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100)
(using DMA data transfers)
pciide0: secondary channel wired to compatibility mode
atapibus0 at pciide0 channel 1: 2 targets
cd0 at atapibus0 drive 0: <MATSHITA CR-177, , 7T0D> type 5 cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
pciide0: secondary channel interrupting at irq 15
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
DMA data
transfers)
.
.
.
wd0e: error reading fsbn 13150064 of 13150064-13150079 (wd0 bn
13282112; cn 13176 tn 11 sn 11), retrying
wd0: (uncorrectable data error)
wd0e: error reading fsbn 13150064 of 13150064-13150079 (wd0 bn
13282112; cn 1317
6 tn 11 sn 11), retrying
wd0: (uncorrectable data error)
wd0e: error reading fsbn 13150064 of 13150064-13150079 (wd0 bn
13282112; cn 1317
6 tn 11 sn 11), retrying
wd0: (uncorrectable data error)
wd0: transfer error, downgrading to Ultra-DMA mode 2
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using
DMA data
transfers)
wd0e: error reading fsbn 13150064 of 13150064-13150079 (wd0 bn
13282112; cn 1317
6 tn 11 sn 11), retrying
wd0: (uncorrectable data error)
wd0: soft error (corrected)
wd0: transfer error, downgrading to Ultra-DMA mode 1
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA data
transfers)
wd0e: error reading fsbn 13149796 of 13149796-13149797 (wd0 bn
13281844; cn 1317
6 tn 6 sn 58), retrying
wd0: (uncorrectable data error)
wd0: transfer error, downgrading to DMA mode 2wd0(pciide0:0:0): using
PIO mode 4, DMA mode 2 (using DMA data transfers)
wd0e: error reading fsbn 13149796 of 13149796-13149797 (wd0 bn
13281844; cn 1317
6 tn 6 sn 58), retrying
wd0: (uncorrectable data error)
wd0: transfer error, downgrading to PIO mode 4
wd0(pciide0:0:0): using PIO mode 4


After a while, the RAID fails a disk:

raid0: IO Error.  Marking /dev/wd0e as failed.
raid0: node (Rmir) returned fail, rolling backward
raid0: DAG failure: r addr 0xc8a624 (13149732) nblk 0x2 (2) buf 0xdc51b000

"raidctl" - gives:

/sbin/raidctl -s raid0
Components:
            /dev/wd1e: optimal
            /dev/wd0e: failed
No spares.
Component label for /dev/wd1e:
    Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
    Version: 2, Serial Number: 209111338, Mod Counter: 215
    Clean: No, Status: 0
    sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
    Queue size: 100, blocksize: 512, numBlocks: 160704256
    RAID Level: 1
    Autoconfig: Yes
    Root partition: Yes
    Last configured as: raid0
/dev/wd0e status is: failed.  Skipping label.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.


cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC