Subject: raidframe problems
To: NetBSD Users <netbsd-users@NetBSD.org>
From: Louis Guillaume <lguillaume@berklee.edu>
List: netbsd-users
Date: 02/11/2007 16:06:33
Hi!

First let me say that I have had excellent experiences with raidframe.
Every one of these good experiences has involved using SCSI disks.

Every time I've used raidframe with some form of ATA disk there has been
trouble. And now I've had another such experience to report.

Here's the hardware:

$ dmesg |egrep "(hpt|atabus[01])"
hptide0 at pci0 dev 11 function 0
hptide0: Triones/Highpoint HPT372A IDE Controller
hptide0: bus-master DMA support present
hptide0: primary channel wired to native-PCI mode
hptide0: using irq 12 for native-PCI interrupt
atabus0 at hptide0 channel 0
hptide0: secondary channel wired to native-PCI mode
atabus1 at hptide0 channel 1
wd0 at atabus0 drive 0: <ST3120026AS>
wd0(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
wd1 at atabus1 drive 0: <ST3120026AS>
wd1(hptide0:1:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
hptide0 at pci0 dev 11 function 0
hptide0: Triones/Highpoint HPT372A IDE Controller
hptide0: bus-master DMA support present
hptide0: primary channel wired to native-PCI mode
hptide0: using irq 12 for native-PCI interrupt
atabus0 at hptide0 channel 0
hptide0: secondary channel wired to native-PCI mode
atabus1 at hptide0 channel 1
wd0 at atabus0 drive 0: <ST3120026AS>
wd0(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
wd1 at atabus1 drive 0: <ST3120026AS>
wd1(hptide0:1:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)


These are SATA drives. Here is the raid config...

$ cat /etc/raid0.conf
START array
1 2 0

START disks
/dev/wd0a
/dev/wd1a

START layout
128 1 1 1

START queue
fifo 100


... disklabels on real disks...

# disklabel wd0| grep "^ [a-z]:"
 a: 234441585  63       RAID                    # (Cyl.      0*- 232580)
 c: 234441585  63     unused      0     0       # (Cyl.      0*- 232580)
 d: 234441648   0     unused      0     0       # (Cyl.      0 - 232580)

# disklabel wd1| grep "^ [a-z]:"
 a: 234441585  63       RAID                    # (Cyl.      0*- 232580)
 c: 234441585  63     unused      0     0       # (Cyl.      0*- 232580)
 d: 234441648   0     unused      0     0       # (Cyl.      0 - 232580)

... disklabel on raid0 ...

# disklabel raid0| grep "^ [a-z]:"
disklabel: Invalid signature in mbr record 0
 a:  41943040        0  4.2BSD 2048 16384 28088 #(Cyl.     0 - 40959 )
 b: 192498432 41943040  4.2BSD 2048 16384 28832 #(Cyl. 40960 - 28946*)
 d: 234441472        0  unused    0     0       #(Cyl.     0 - 228946*)



FFS fileystems were created on raid0a and raid0b. Data were copied on
using pax and rsync.

fsck throws at least one DIRECTORY CORRUPTED message in Phase 2 and TONS
of UNREF FILE messages in Phase 4.

fsck claims to fix all problems, but a second run of fsck shows more
issues, and a third etc.

These problems do not show up if filesystems are created on the
non-raid, plain old, wd[x] disks.

I've tried wiping out the disklabel and boot sectors on these disks
with, "dd if=/dev/zero of=/dev/rwd0d bs=1k count=1k", and starting over
from scratch. Same errors with raidframe.


Any help/advice would be great.

Thanks!

Louis