Subject: Re: RAIDFrame and NetBSD/sparc booting issues
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
From: Greg Oster <oster@cs.usask.ca>
List: port-sparc
Date: 08/12/2003 08:55:03
Brian Buhrow writes:
> 	Hello Greg.  I think what you're describing is similar to how Sun's
> online: disdsuite works.  They keep multiple copies of the raid component
> information in a separate partition, and compare all of the copies of the
> databases.  the majority of copies which agree, win the configuration war.

That makes it hard to move a RAID set from one machine to another.  
If machine A has disks 1, 2, and 3, and you move disks 4, 5, and 6 
from machine B to machine A, which "majority of configuration 
databases" wins?  (more below :) ).

> 	They make no provision for checking if underlying components have been
> written -- leaving that up to the user to verify.  As a result, if it
> becomes necessary to boot from one of the components, i.e. because you need
> to boot from cdrom, and the cdrom doesn't support Disksuite, then you can
> do that.  Then, after you make your change to the single component, you
> must make sure to boot from that component in order to make it the "master"
> component.  One way to insure this is to disable the other components of
> the raid1 set until things are up and going, and you're ready to resync all
> the drives.
> 	Of course, this only works for raid1 sets, and the documentation gives
> dire warnings about futzing with raid5 components outside of the raid
> environment.  

Right.  And if folks are willing to live with the "post plenty of 
warnings, and let people shoot themselves in the foot", then it 
wouldn't take very long to finish designing metadata partition, 
and to implement it.  (assuming one can find time to do this...)
 
> 	One variation on this theme, that I could envision working fairly
> well, would be to create a new partition type, raid_metadata, which
> contains the raid configuration information for all of the raids on a given
> system.  This would hold things like serial number, component count, and
> mod counter for each raid on the system.  

How does it identify "components on other disks"?  Simply by serial 
number?  That still requires putting a label in each component, which 
is something I'd like to move away from.

> Then, as Greg W suggests, reserve
> some space, I prefer a fixed number like 32 or 64 sectors, rather than a
> track size for the geometry reasons you allude to, at the end of the
> traditional "raid" partition.  

This still makes it hard to "merge" an existing FFS (or whatever) 
partition into a RAID 1 set.  What I'd like to see is something like:
 a) run disklabel to create the metadata partition (hopefully 
stealing a chunk from an existing swap partition).
 b) run 'raidctl -n 1 raid0' to create an empty RAID-1 set.
 c) run 'raidctl -m /dev/sd0a raid0' to create a degraded RAID-1 with 
sd0a as the first (and master) component.
Nothing additional would have to be added to /dev/sd0a.  All 
component label information for sd0a would be stored in the metadata
partition.

If one has to put a component label at the beginning or end of a 
partition, it's much harder to do the above.  (esp. with live disks 
that contain existing data...)

> In that space, store the traditional
> component label, including serial number, component count, and mod counter
> again.  Now, if you get a raid partition which has a component label which
> has a serial number which doesn't appear in your raid_metadata databases,
> you can be pretty sure that that raid partition isn't part of your list of
> known raid sets.

Except that if you're moving disks from one machine to another, it'd 
be nice to have them "autoconfig" on the new machine too.

> 	The down side of this arrangement is that you'll have to search for
> "raid_metadata" partitions on all drives at boot time, and you'll have to
> note how many databases you have, and assign a database serial number to
> match them all together.  That might not be bad, though, compared to the
> need to search for "raid" partitions on all drives now.

I'm not sure much is gained by having both a "RAID Metadata" partition
*and* having component labels on individual partitions.  The latter 
is only required if the RAID Metadata is expected to describe 
locations of all RAID components across all disks.  If one restricts 
the RAID Metadata to just describing RAID components on the given 
disk, then the disks can be shuffled around at will to other 
machines, and all RAID sets will be detected properly with minimal 
fuss.

> On Aug 11,  3:34pm, Greg Oster wrote:
> } Subject: Re: RAIDFrame and NetBSD/sparc booting issues
> } It'd be fairly easy to do, but there is no real gain to making an 
> } existing RAID set part of a RAID 1 set (in general).  What I'd like 
> } to see (and I've probably mentioned this before) is a "RAID MetaData" 
> } partition.  This metadata partition would contain the various 
> } component labels.  You could then directly boot from the regular 
> } partitions and what-not, yet RAIDframe would be able to detect which 
> } components belonged to which RAID sets. 
> } 
> } Now while this makes it much easier to boot directly from a RAID 1 
> } set (which is nice, and all that), it does have a huge downside. 
> } That downside is that while it would be really easy to read from the 
> } RAID 1 set, it also would be really easy to now mount that partition
> } as a regular filesystem -- and that makes it really easy to write to 
> } said component.  And that's just going to cause major problems if you 
> } ever try to use that partition/component as part of the old RAID set!!
> } If you (or anyone) can come up with a foolproof (and fast!) way to 
> } detect if any data on a component has changed since the last time 
> } RAIDframe used it, I'm all ears :)  (I can think of a few non-practical 
> } ways of doing this, but none that make any sense for a real system...)
> } 
> } (Minor historical note:  One of the main reasons for putting the 
> } component label at the beginning of the component was to also
> }  'leave space' for things like disklabels and what-not.  If a component 
> } label now lives at the end of a component, anything other than RAID-1 
> } would have the same problems that ccd has with either overlapping 
> } disklabels, or disklabels that want to interfer with partitions 
> } beginning at block 0)
> } 
> >-- End of excerpt from Greg Oster
> 

Later...

Greg Oster