Subject: Re: raidframe problems (revisited)
To: NetBSD Users <netbsd-users@NetBSD.org>
From: Matthias Scheler <tron@zhadum.org.uk>
List: netbsd-users
Date: 05/28/2007 22:45:20
On Mon, May 28, 2007 at 02:03:46AM -0400, Louis Guillaume wrote:
> I've been using my SATA drives for a month or so now with one side of
> the RAID1 failed like this...
> 
> Components:
>            /dev/wd0a: optimal
>            /dev/wd1a: failed
> No spares.
> 
> ... and it is flawless.
> 
> I am extremely confident and I promise you that if I reconstruct this
> array, I'll see the corruption.

That sounds familar. I've got a similar problem with a NetBSD-i386 3.1 system.
The hardware is very different:

NetBSD 3.1_STABLE (BEAVER) #0: Sun Apr 29 16:01:19 CEST 2007
	tron@beaver.core.de:/usr/src/sys/arch/i386/compile/BEAVER
total memory = 1023 MB
avail memory = 997 MB
BIOS32 rev. 0 found at 0xf0e90
mainbus0 (root)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Pentium 4 (686-class), 2018.09 MHz, id 0xf24
cpu0: features 3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 3febfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features 3febfbff<FXSR,SSE,SSE2,SS,HTT,TM>
cpu0: "Intel(R) Pentium(R) 4 CPU 2.00GHz"
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu0: L2 cache 512 KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: running without thermal monitor!
cpu0: calibrating local timer
cpu0: apic clock running at 100 MHz
cpu0: 16 page colors
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 20, 24 pins
acpi0 at mainbus0
acpi0: using Intel ACPI CA subsystem version 20040211
acpi0: X/RSDT: OemId <ASUS  ,P4B266  ,42302e31>, AslId <MSFT,31313031>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
mpacpi: could not get bus number, assuming bus 0
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
acpibut0 at acpi0 (PNP0C0C): ACPI Power Button
PNP0C01 [System Board] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0A03 [PCI Bus] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
PNP0100 [AT Timer] at acpi0 not configured
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
PNP0800 [AT-style speaker sound] at acpi0 not configured
npx0 at acpi0 (PNP0C04)
npx0: io 0xf0-0xff irq 13
npx0: using exception 16
fdc0 at acpi0 (PNP0700)
fdc0: io 0x3f2-0x3f5,0x3f7 irq 6 drq 2
lpt0 at acpi0 (PNP0401)
lpt0: io 0x378-0x37f,0x778-0x77b irq 7 drq 3
com0 at acpi0 (PNP0501-1)
com0: io 0x3f8-0x3ff irq 4
com0: ns16550a, working fifo
com1 at acpi0 (PNP0501-2)
com1: io 0x2f8-0x2ff irq 3
com1: ns16550a, working fifo
pckbc0 at acpi0 (PNP0303): kbd port
pckbc0: io 0x60,0x64 irq 1
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82845 Host (rev. 0x04)
agp0 at pchb0: aperture at 0xf8000000, size 0x4000000
ppb0 at pci0 dev 1 function 0: Intel 82845 AGP (rev. 0x04)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
ppb1 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0x05)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
cmpci0 at pci2 dev 3 function 0: C-Media Electronics CMI8738/C3DX PCI Audio Device (rev. 0x10)
cmpci0: interrupting at ioapic0 pin 21 (irq 12)
audio0 at cmpci0: full duplex, mmap, independent
opl at cmpci0 not configured
mpu at cmpci0 not configured
vga0 at pci2 dev 9 function 0: S3 Trio32/64 (rev. 0x54)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
ex0 at pci2 dev 11 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0x74)
ex0: interrupting at ioapic0 pin 23 (irq 5)
ex0: MAC address 00:04:76:1a:33:b8
bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pdcide0 at pci2 dev 12 function 0
pdcide0: Promise Ultra100TX2/ATA Bus Master IDE Accelerator (rev. 0x01)
pdcide0: bus-master DMA support present
pdcide0: primary channel configured to native-PCI mode
pdcide0: using ioapic0 pin 20 (irq 10) for native-PCI interrupt
atabus0 at pdcide0 channel 0
pdcide0: secondary channel configured to native-PCI mode
atabus1 at pdcide0 channel 1
pcib0 at pci0 dev 31 function 0
pcib0: Intel 82801BA LPC Interface Bridge (rev. 0x05)
piixide0 at pci0 dev 31 function 1
piixide0: Intel 82801BA IDE Controller (ICH2) (rev. 0x05)
piixide0: bus-master DMA support present
piixide0: primary channel wired to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus2 at piixide0 channel 0
piixide0: secondary channel wired to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus3 at piixide0 channel 1
uhci0 at pci0 dev 31 function 2: Intel 82801BA USB Controller (rev. 0x05)
uhci0: interrupting at ioapic0 pin 19 (irq 11)
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 31 function 4: Intel 82801BA USB Controller (rev. 0x05)
uhci1: interrupting at ioapic0 pin 23 (irq 5)
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
isa0 at pcib0
pcppi0 at isa0 port 0x61
sysbeep0 at pcppi0
ioapic0: enabling
Kernelized RAIDframe activated
IPsec: Initialized Security Association Processing.
wd0 at atabus0 drive 0: <SAMSUNG SP1203N>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 111 GB, 232632 cyl, 16 head, 63 sec, 512 bytes/sect x 234493056 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(pdcide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
wd1 at atabus2 drive 0: <WDC WD2500SB-01RFA0>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
wd2 at atabus3 drive 0: <WDC WD2500SB-01RFA0>
wd2: drive supports 16-sector PIO transfers, LBA48 addressing
wd2: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752 sectors
wd2: 32-bit data port
wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd2(piixide0:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
raid0: RAID Level 1
raid0: Components: component0[**FAILED**] /dev/wd1a
raid0: Total Sectors: 490234624 (239372 MB)
boot device: raid0
root on raid0a dumps on raid0b
root file system type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)

The problem is however similar. The system works perfectly stable (weeks
of uptime) if I leave the RAIDframe mirror in its current state (one
component failed). As soon as I try to rewrite the parity the system
freezes in less than an hour and needs to be resetted.

We've already replaced all of the memory modules and the disks in the
machine but it didn't helped.

	Kind regards

-- 
Matthias Scheler                                  http://zhadum.org.uk/