Subject: RAIDFrame Problems...
To: None <netbsd-users@netbsd.org>
From: Alex Dumitriu <alex.dumitriu@gmail.com>
List: netbsd-users
Date: 03/03/2005 14:16:34
Greetings folks, 

I'm going a bit crazy trying to create a RAID 1+0 volume on a
2.0_STABLE i386 system. I'm seeing a lot of system instability. What's
making me extra crazy is that I'm not getting dropped to the debugger,
and there are no messages of any kind being logged, the system just
locks up hard; no keyboard response, no messages going to the console,
no network, no nothing.

It usually craps out either when I try to newfs the volume, or else
when I try to benchmark it using bonnie++. Here's what I'm working
with:

raid0.conf:
START array
# numRow numCol numSpare
1 2 0

START disks
/dev/wd0e
/dev/wd1e

START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
64 1 1 1

START queue
fifo 100

------------
raid1.conf:

# more raid1.conf 
START array
# numRow numCol numSpare
1 2 0

START disks
/dev/wd2e
/dev/wd3e

START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
64 1 1 1

START queue
fifo 100

--------------------

raid2.conf:

START array
# numRow numCol numSpare
1 2 0

START disks
/dev/raid0e
/dev/raid1e

START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
64 1 1 0

START queue
fifo 100

-----------

I can initialize both raid0 and raid1 with no problems, and as an
experiment, I've also tried putting filesystems on those RAIDs.... it
seems to work fine. Once I stripe them into a RAID 1+0, however,
things start getting ugly. Newfs will bring down the system perhaps
60% of the time. Occasionally, It has reported back something to the
effect of "sector 16, read-only filesystem." I'm not at all sure what
it means by that, especially since it waits until it has finished
making all the super-block backups before complaining.

Here's some things that I think may be tripping me up: First, my disks
are not identical. I've labeled them so that the partitions I'm using
are identical, but:

dmesg |grep wd 

wd0 at atabus0 drive 0: <Maxtor 4G120J6>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 114 GB, 238216 cyl, 16 head, 63 sec, 512 bytes/sect x 240121728 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(pdcide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100)
(using DMA data transfers)
wd1 at atabus2 drive 0: <Maxtor 6Y120L0>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 114 GB, 238216 cyl, 16 head, 63 sec, 512 bytes/sect x 240121728 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(pdcide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133)
(using DMA data transfers)
wd2 at atabus3 drive 0: <Maxtor 6Y120L0>
wd2: drive supports 16-sector PIO transfers, LBA addressing
wd2: 114 GB, 238216 cyl, 16 head, 63 sec, 512 bytes/sect x 240121728 sectors
wd2: 32-bit data port
wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd2(pdcide1:1:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133)
(using DMA data transfers)
wd3 at atabus6 drive 0: <WDC WD2000BB-22DWA0>
wd3: drive supports 16-sector PIO transfers, LBA48 addressing
wd3: 186 GB, 387618 cyl, 16 head, 63 sec, 512 bytes/sect x 390719855 sectors
wd3: 32-bit data port
wd3: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd3(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100)
(using DMA data transfers)

(I'll post my disklabels if anyone wants, but I'm pretty sure that
they, at least, are OK. I'm using 63s offsets everywhere (wd[1234],
raid[012] ))

As you can see, each drive is on its own channel.

Now, my understanding is that using mismatched disks can result in a
performance hit, plus a waste of space, but it shouldn't cause the
system to hang, right?

The other thing that I'm unsure about is this: Should one run fdisk -u
on the raid volumes before labeling them? It seems to work either way,
though disklabel complains about the MBR being wrong if fdisk hasn't
been run.

What the heck am I missing? Are there any non-obvious kernel options I
need to enable for RAID 1+0? Is there anything I can do to make
RAIDFrame more verbose?

Any and all suggestions welcome.....

Thanks in advance,
-alex.