port-sparc: Re: RAIDFrame and NetBSD/sparc booting issues

Subject: Re: RAIDFrame and NetBSD/sparc booting issues
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
From: Greg Oster <oster@cs.usask.ca>
List: port-sparc
Date: 08/13/2003 18:28:37
Brian Buhrow writes:
> 	Hello.  I've followed this entire discussion with much interest and
> thought more about the way raiframe works now and the question of how to
> make things work more transparently on raid-1 systems with existing boot
> roms.  I realized I have a couple of questions and would like to make sure
> my understanding of the current layout of raidframe partitions is correct.
> 
> 1.  It's my understanding that the area protected by RF_PROTECTEDSECTORS is
> designed to include such things as the physical disklabel, any boot
> strapping code that might reside on the physical disk, and the raid
> component label itself. 

Not quite.  RF_PROTECTEDSECTORS is there to tell the data-writing 
bits of RAIDFrame to "don't touch this space at the beginning of a 
component".  It turns out that the space that we skip over contains 
all of the things you mention above :)

> Thiss would imply that disklabel -r sd0 or wd0
> should read the label out of this protected region, assuming the raid
> partition includes the entire disk.  Is this right?

No.  If the disklabel usually lives at block 0 for the arch, then the
disklabel for raid0 will be found at block 0 of the data area of the RAID 
set.  (i.e. typically at block RF_PROTECTEDSECTORS of the first component 
of any RAID set).  Doing a 'disklabel sd0' or 'disklabel wd0' will 
get the label from the underlying drive, but that will never contain 
the disklabel for raid0 (at least as things currently stand).
 
> 2.  It looks to me like most of the boot loaders work in such a way that
> the first stage loader has the block numbers of the second stage loader
> hard coded into them, meaning that the second stage loader could be loaded
> from any portion of the disk, including the first portion of an ffs
> filesystem inside a raid-1 partition.  Once the second stage loader is
> loaded, I believe space and code constraints are sufficiently removed, that
> the second stage loader could properly locate a kernel inside a raid-1 set
> or a physical disk directly -- no?  If this is so, then it strikes me as
> easier to teach the second stage boot loader how to locate a kernel either
> in an FFS filesystem in a raid-1 set or in an FFS filesystem in a physical
> partition.  Of course, once the kernel is loaded, it can already find the
> FFS filesystems inside any raid sets, so that problem is solved.

Some of this is already done and working, with much thanks to others.

> (I should note, that it would also be necessary for the installboot program
> to know about FFS filesystems inside raid-1 partitions as well, just so it
> can plug in the right numbers for the second stage loader, even if that
> loader is inside an FFS filesystem in a raid-1 set.  (Presumably, it could
> also locate the loader in a raid-5 set, but of course that wouldn't
> actually boot unless the kernel happened to fit inside one of the stripes
> of the raid, but that's an entirely different problem :).))

I'd completely ignore RAID 0 and RAID 5 for this... while getting a 
kernel from a RAID 0 is just a matter of a little math, getting a kernel 
loaded off of a failed RAID 5 set is a *lot* of math.

> 	If that problem is solved, I fail to see the need for moving a
> component label around and thus having to special case the raid-1 instance
> inside the raidframe code.  This, would, I believe, free up Greg O's time
> to look into issues like:
> 
> 1.  Determining the feasibility of changing the autoconfiguration code to
> account for hot standby partitions, and being able to auto-reconstruct into
> them in the event of a component failure without user intervention.
> 
> 2.  Examine why paging to a raid-5 set causes hangs.

I've actually gained a fair bit of insight into this over the last 
month (with much thanks to the help and 400MB kernel core files provided 
by an unnamed source).  I won't go into the details here, but basically 
if there is any paging (and not just to swap space!) to any device 
that doesn't have a malloc-free code-path, then the potential is 
there for a deadlock.  RAIDframe doesn't need to be involved here, 
though I havn't had time to come up with a repeatable test case yet. 
(Softdeps seems to speed up the hang, since it seems to have a lot 
more pages that are not PG_CLEAN).
 
> 	My point here is that I believe this discussion started because there
> is some concern that teaching a system to boot from a raid-1 set is not as
> straightforward as it should be.  Unless I'm gravely mistaken, and please
> tell me if I am, this deficiency can be met by modifying the
> orders-of-magnetude less complicated boot loder programs for the various
> architectures than by modifying the raidframe system itself. 

Considering the amount of time I have for RAIDframe hacking at the 
present, small changes to boot loaders looks very good to me :-}

Later...

Greg Oster