Subject: Re: RAIDFrame Problems...
To: None <alex@bitblot.com>
From: Anders Dinsen <anders@dinsen.net>
List: netbsd-users
Date: 03/03/2005 22:10:03
I have seen similar lockups, the last one crashed a file system 
completely, from which I haven't had time to recover from yet. I do 
however attribute the problems to my 3Com netcards and their driver 
which seems to be somewhat unstable. Otherwise it's been rock solid, 
even when newfs'ing and running bonnie.

I followed the configuration in the NetBSD Guide.

Knowing this probably doesn't help ;-)
Cheers,
Anders


Alex Dumitriu wrote:
> Greetings folks, 
> 
> I'm going a bit crazy trying to create a RAID 1+0 volume on a
> 2.0_STABLE i386 system. I'm seeing a lot of system instability. What's
> making me extra crazy is that I'm not getting dropped to the debugger,
> and there are no messages of any kind being logged, the system just
> locks up hard; no keyboard response, no messages going to the console,
> no network, no nothing.
> 
> It usually craps out either when I try to newfs the volume, or else
> when I try to benchmark it using bonnie++. Here's what I'm working
> with:
> 
> raid0.conf:
> START array
> # numRow numCol numSpare
> 1 2 0
> 
> START disks
> /dev/wd0e
> /dev/wd1e
> 
> START layout
> # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
> 64 1 1 1
> 
> START queue
> fifo 100
> 
> ------------
> raid1.conf:
> 
> # more raid1.conf 
> START array
> # numRow numCol numSpare
> 1 2 0
> 
> START disks
> /dev/wd2e
> /dev/wd3e
> 
> START layout
> # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
> 64 1 1 1
> 
> START queue
> fifo 100
> 
> --------------------
> 
> raid2.conf:
> 
> START array
> # numRow numCol numSpare
> 1 2 0
> 
> START disks
> /dev/raid0e
> /dev/raid1e
> 
> START layout
> # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
> 64 1 1 0
> 
> START queue
> fifo 100
> 
> -----------
> 
> I can initialize both raid0 and raid1 with no problems, and as an
> experiment, I've also tried putting filesystems on those RAIDs.... it
> seems to work fine. Once I stripe them into a RAID 1+0, however,
> things start getting ugly. Newfs will bring down the system perhaps
> 60% of the time. Occasionally, It has reported back something to the
> effect of "sector 16, read-only filesystem." I'm not at all sure what
> it means by that, especially since it waits until it has finished
> making all the super-block backups before complaining.
> 
> Here's some things that I think may be tripping me up: First, my disks
> are not identical. I've labeled them so that the partitions I'm using
> are identical, but:
> 
> dmesg |grep wd 
> 
> wd0 at atabus0 drive 0: <Maxtor 4G120J6>
> wd0: drive supports 16-sector PIO transfers, LBA48 addressing
> wd0: 114 GB, 238216 cyl, 16 head, 63 sec, 512 bytes/sect x 240121728 sectors
> wd0: 32-bit data port
> wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
> wd0(pdcide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100)
> (using DMA data transfers)
> wd1 at atabus2 drive 0: <Maxtor 6Y120L0>
> wd1: drive supports 16-sector PIO transfers, LBA addressing
> wd1: 114 GB, 238216 cyl, 16 head, 63 sec, 512 bytes/sect x 240121728 sectors
> wd1: 32-bit data port
> wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
> wd1(pdcide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133)
> (using DMA data transfers)
> wd2 at atabus3 drive 0: <Maxtor 6Y120L0>
> wd2: drive supports 16-sector PIO transfers, LBA addressing
> wd2: 114 GB, 238216 cyl, 16 head, 63 sec, 512 bytes/sect x 240121728 sectors
> wd2: 32-bit data port
> wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
> wd2(pdcide1:1:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133)
> (using DMA data transfers)
> wd3 at atabus6 drive 0: <WDC WD2000BB-22DWA0>
> wd3: drive supports 16-sector PIO transfers, LBA48 addressing
> wd3: 186 GB, 387618 cyl, 16 head, 63 sec, 512 bytes/sect x 390719855 sectors
> wd3: 32-bit data port
> wd3: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
> wd3(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100)
> (using DMA data transfers)
> 
> (I'll post my disklabels if anyone wants, but I'm pretty sure that
> they, at least, are OK. I'm using 63s offsets everywhere (wd[1234],
> raid[012] ))
> 
> As you can see, each drive is on its own channel.
> 
> Now, my understanding is that using mismatched disks can result in a
> performance hit, plus a waste of space, but it shouldn't cause the
> system to hang, right?
> 
> The other thing that I'm unsure about is this: Should one run fdisk -u
> on the raid volumes before labeling them? It seems to work either way,
> though disklabel complains about the MBR being wrong if fdisk hasn't
> been run.
> 
> What the heck am I missing? Are there any non-obvious kernel options I
> need to enable for RAID 1+0? Is there anything I can do to make
> RAIDFrame more verbose?
> 
> Any and all suggestions welcome.....
> 
> Thanks in advance,
> -alex.