Subject: Re: RAIDFrame and NetBSD/sparc booting issues
To: NetBSD/sparc Discussion List <port-sparc@NetBSD.ORG>
From: Greg Oster <oster@cs.usask.ca>
List: port-sparc
Date: 08/11/2003 15:34:08
"Greg A. Woods" writes:
[snip]
> Also, is there currently any way for RAIDframe to notice, say from the
> fs_time in the superblock vs. some timestamp in its own component

RAIDframe doesn't care *at all* what filesystem it might have on it (if any).

> labels, that one of the filesystem copies had been used directly more
> recently than the combined RAID-1 filesystem and thus be able to figure
> out which component is out of date and needs reconstructing?  If this
> would work is there any code which would implement it? 

Right now if you change any data on a component without going through 
RAIDframe, "all bets are off".

> It seems to me
> this would require RAIDframe to know something of the internal structure
> of the filesystems contained in it's logical volume partitions, but in
> general I don't see why this would be too difficult to achieve.

Are you willing to teach RAIDframe about every type of filesystem 
that NetBSD supports both now and in the future? :)  (alternatly, can 
you find someone willing to do this?  And for all archs? :) )

> > This hack works, but you have to really understand what you're doing with t
> he
> > offsets.
> 
> Yes, I'd say that's the only part that's hackish in some sense at the
> moment.
> 
> It would be nice if RAIDframe would place its component label just
> inside the end of the partition instead of at the beginning, always
> reserving exactly one track of the physical partition to use for this
> purpose (at least I think it makes more sense to always reserve a whole
> track instead of some arbitrary number like 64 sectors). 

Resulting in different positions for the label if components happen 
to be on drives of different geometries.  RAIDframe currently 
doesn't care about the underlying geometry... (and this sounds like a 
fine way to lose a component label if someone changes their IDE drive 
from LBA to something else :-} )

> This way one
> could simply subtract a track's worth of sectors from the size of the
> last matching "plain" partition on each component (and optionally create
> mini one-track partition entries to document the reserved the space for
> the RAIDframe component labels).  This way the offset and size of each
> actual filesystem partition, both those inside the RAID-1 volume and
> those for each "plain" partition on each physical component volume,
> would be identical regardless of whether the mirror was in use or not,
> or even if it had yet been created.  This way it would be nearly trivial
> to create a mirror of any existing boot drive for any kind of platform
> (having only to adjust the size of the "last" paritition if there wasn't
> already a track's worth of spare sectors following it).

The big advantage is that one would have much fewer headaches related 
to what partitions different archs are willing to boot from, and 
where those partitions are allowed to be on the disk...  The "trivial 
mirror creation" doesn't really exist in this scenario.  Unless you 
can shrink the filesystem, only the filesystem "above" swap space
(or a filesystem above "empty space") can be easily RAID-1'ed.

> Greg do you have any comment on how hard it would be to modify RAIDframe
> to do this, at least for RAID-1 volumes, and whether or not you think it
> makes sense?

It'd be fairly easy to do, but there is no real gain to making an 
existing RAID set part of a RAID 1 set (in general).  What I'd like 
to see (and I've probably mentioned this before) is a "RAID MetaData" 
partition.  This metadata partition would contain the various 
component labels.  You could then directly boot from the regular 
partitions and what-not, yet RAIDframe would be able to detect which 
components belonged to which RAID sets. 

Now while this makes it much easier to boot directly from a RAID 1 
set (which is nice, and all that), it does have a huge downside. 
That downside is that while it would be really easy to read from the 
RAID 1 set, it also would be really easy to now mount that partition
as a regular filesystem -- and that makes it really easy to write to 
said component.  And that's just going to cause major problems if you 
ever try to use that partition/component as part of the old RAID set!!
If you (or anyone) can come up with a foolproof (and fast!) way to 
detect if any data on a component has changed since the last time 
RAIDframe used it, I'm all ears :)  (I can think of a few non-practical 
ways of doing this, but none that make any sense for a real system...)

(Minor historical note:  One of the main reasons for putting the 
component label at the beginning of the component was to also
 'leave space' for things like disklabels and what-not.  If a component 
label now lives at the end of a component, anything other than RAID-1 
would have the same problems that ccd has with either overlapping 
disklabels, or disklabels that want to interfer with partitions 
beginning at block 0)

> Could this be done in a backwards compatible way as well?

Likely.  (Look at the end for a label, if none found, look at the 
beginning.  etc.)  I'd really like to re-do the component label stuff
to use meta-data partition though.  That makes it much easier to convert 
existing disks to RAID-1 sets.  (e.g. steal space from the swap 
partition to create the metadata area, and away you go...)

Later...

Greg Oster