current-users: Re: RAIDFrame problems

Subject: Re: RAIDFrame problems
To: Paul Newhouse <newhouse@rockhead.com>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 01/23/1999 11:12:08
Paul Newhouse writes:
> Greg Oster <oster@cs.usask.ca> writes:
> 
> >Paul Newhouse writes:
> >> Platform i386
> >
> >What vintage of -current? (i.e. based on sources of what date?)
> 
> The kernel is current as of 22 Jan 99.  RAIDFrame as of 13 Nov 98.

Umm...  RAIDframe is in the Jan 22 -current...  Hopefully that's the one you're
using...

> >Hmm... How big are the components?  
> 
> 3 6.4GB Maxtor UDMA IDE

Ok.
 
> >Is this error repeatable?  (just searching for more clues...)
> 
> YUP!

:-(

> >> Am I doing something obviously wrong?
> >
> >Nope.  (I just tried the above command on my test box, and it completed just
>  
> >fine... so this could be a bit entertaining to track down :-/ )
> 
> We retuned the config to:
> 
>    START array
>    1 3 0
>    START disks
>    /dev/wd1b
>    /dev/wd2b
>    /dev/wd3b
>    START layout
>    # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
>    8 1 1 5

You should be able to leave the 8 at 32 (or something even higher...)

>    START queue
>    fifo 16

I'm suspecting that lower numbers here might work better for IDE disks... 
(less simultaneous requests per disk)

> and it works rather well.

Does it die with the "dd ..."?

>  I think the disklabels are wrong, we're getting:
> 
>    WARNING: raid0: total sector size in disklabel (25314660) != the size of r
> aid (25314528)
>    WARNING: raid0: end of partition `c' exceeds the size of raid (25314528)
>    WARNING: raid0: end of partition `d' exceeds the size of raid (25314528)
>    WARNING: raid0: total sector size in disklabel (25314660) != the size of r
> aid (25314528)
>    WARNING: raid0: end of partition `c' exceeds the size of raid (25314528)
>    WARNING: raid0: end of partition `d' exceeds the size of raid (25314528)
> 
> with the bad config and no errors with the good config.  I don't understand t
> his behaviour.

According to your disklabels (below), the "raw" component size is: 12657330
Multiply this by 2, and you get 25314660.  However: 25314660 is not correct
for the total size of the RAID "disk" in the disklabel. 

The reason 25314660 is not correct is that RAIDframe reserves a few blocks at 
the beginning of each component for it's own use (ok, so it's not actually 
in use in the -current code, but "hopefully soon" :-) )

Assuming you don't have anything valuable on the RAID yet, try the following:

  dd if=/dev/zero of=/dev/rraid0d bs=1024 count=32

(this will nuke the disklabel (and nothing else, currently))  
Then do:
  disklabel raid0 > /tmp/label
  vi /tmp/label # and edit to taste
  disklabel -R -r raid0 /tmp/label

and get things in the disklabel fixed up.
(I'm curious to know how you built the label..  I'm suspecting that maybe
you changed sizes of the components a few times, and the other numbers got
misaligned on you...)

As to why RAIDframe doesn't complain about disk sizes with one config 
and does with the other is likely due to the changes in the amount of disk 
useable with the smaller stripe size...  

e.g. if the disk is 1000 sectors (after the 64 "reserved sectors" are 
accounted for), then a 32 sectors per strip unit yields 1000/32 = 31.25 (i.e. 
31) useable stripes.  Thus the # of sectors used will be 992.  On the other 
hand, 8 sectors per strip unit yields 1000/8 = 125 useable strips, and the 
entire 1000 sectors will be used.  If you change the stripe size from 32 to 8 
(as you've done), then the amount of actual disk you can use will change too, 
and you'll want/need to update the disklabel.

> The individual disklabels look like:
> 
>    4 partitions:
>    #        size   offset     fstype   [fsize bsize   cpg]
>    a:      945        0     unused        0     0         # (Cyl.    0 -0)
>    b: 12657330      945     4.2BSD      512  4096     0   # (Cyl.    1 -13394
> )
>    c: 12658275        0     unused        0     0         # (Cyl.    0 -13394
> )
>    d: 12658275        0     unused        0     0         # (Cyl.    0 -13394
> )
> 
> The raid label looks like:
> 
>    4 partitions:
>    #        size   offset     fstype   [fsize bsize   cpg]
>    a:     1890        0     unused        0     0         # (Cyl.    0 - 0)
>    b: 25310880     1890     4.2BSD      512  4096     8   # (Cyl.    1 -13392
> )
>    c: 25314660        0     unused        0     0         # (Cyl.    0 -13392
> )
>    d: 25314660        0     unused        0     0         # (Cyl.    0 -13302
> )
> 
> There must be some geometery thing we're misunderstanding?

Ya, I'm thinking it's (mostly) a geometry problem...  If, once the geometry is 
fixed up, it's still giving you grief, then we can dig further... 

Later...

Greg Oster