tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: patch for raidframe and non 512 byte sector devices



On Sat, 6 Nov 2010 15:24:31 -0700
buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow) wrote:

>       hello.  Thanks for this patch.  I've been thinking about some
> other patches for raidframe, and thought I'd share my thoughts with
> the list. These patches reminded me  that I wanted to write down
> these thoughts, and since you're in the code already, Matt, I thought
> you might have some immediate ideas for some of them.
> 
> 1. Raidframe autoconfigure on raw disks.
>       From what I can tell, raidframe can't autoconfigure on a disk
> unless the disk has either a BSD disklabel, or a gpt wedge and the
> raidframe lives inside the defined wedge or in a BSD partition.
> However, it is possible to configure raidframe on a raw disk without
> such a disklabel or gpt table. My thought was to teach raidframe a
> third way of autoconfiguring its components.  Namely, using the same
> trick the boot blocks use to boot off of raid1 partitions.  That is,
> if there is no disklabel containing a raid partition, or no wedge
> containing one, seek to the offset where the raid label would go on a
> raw disk and see if it exists. If enough labels containing the right
> data exist, and the raid is set to autoconfigure, then configure a
> raid set. Is there a reason this hasn't been done already?  Are there
> compelling reasons not to do this that I haven't thought of?  It
> seems like a simple change, but I haven't actually done more than
> glance at the code as yet, so I can't  be  sure it's as trivial as it
> sounds.

I think the only reason it hasn't been done is that a) I never thought
of it and b) no-one else has written the code :)

> 2.  Lazy calculation of parity.
>       One of the down sides of raidframe at this moment is the time
> it takes to perform the initial parity on a large raid set.  The
> recent patches to fix the parity quickly after a crash are great, but
> I don't believe they solve the initial calculation problem.  If I'm
> wrong, then please let me know, and I'll be quiet.  If not, then I'm
> wondering if there is a way to have the parity slices be in one of
> three states:
> 
> A.  Clean - Parity is known and good.
> 
> B.  Dirty - parity is known to be unclean and needs to be rechecked.
> 
> C.  Unused - This slice has never been written to and so can be fixed
> when the initial write request to it is done.

The larger the 'slice' size, the longer the 'first write' will take --
and what happens with filesystems where things like newfs sprinkle
data all over the disk?  

>       It might be that writing what ever data in the parity bit
> maps is required to note that third state is just as expensive in
> terms of time as writing the parity itself, I don't actually know,
> but if not, then it might be a way of configuring and using large
> raid sets quickly without having to wait for that first parity check.

With RAID 1 it's easier to avoid checking everything, but since there's
no way to say "I know both these disks only have zeros on them" we
can't take advantage of those tricks.  (That, and you'd need to
actually zero the disks to make sure the manufacturer didn't leave some
sort of funky driver data or something in some far off corner...)

Later...

Greg Oster


Home | Main Index | Thread Index | Old Index