Subject: Re: GPT guids
To: Jeff Rizzo <riz@tastylime.net>
From: Greg Oster <oster@cs.usask.ca>
List: tech-userlevel
Date: 12/18/2007 09:31:16
Jeff Rizzo writes:
> Greg Oster wrote:
> > jakllsch@kollasch.net writes:
> >   
> >
> >> Also, do we want the RAIDframe header at the beginning of the
> >> partition still? 
> >>     
> >
> > Personally, I'd still like to see the component label bits live in a 
> > separate partition (e.g. a "RAID metadata partition" that contains 
> > component labels for all RAID entities on that particular device..)
> >   
> 
> Have you given any thought to what you'd like this to look like?

Yes....

>  I'm in
> the midst of playing around with various GPT-related things, and while
> my initial instinct was just to make a partition type akin to the RAID
> disklabel type, if you've got ideas how things might be made better or
> more flexible, I'd like to hear them.  There's also enough types
> available :)  that changing this later probably wouldn't be that hard...

So the plan I'd written up eons ago called for a FS_METADATA 
partition type (ala FS_RAID, etc.) to be added to disklabels.  Here 
are a few of my notes from 5 years ago... a lot has changed since 
then, so some of what is written here may no longer be relevant :-}

1) Create a new disklabel type called FS_METADATA.

2) A partition of type FS_METADATA would contain meta-data related to
the filesystems/RAID/partitioning of the given disk, as specified by
the disklabel.  Only one partition of type FS_METADATA is allowed per
disklabel.  Only the first FS_METADATA partition will be used if there
are more than one.

3) The FS_METADATA partition would begin with at least 64 blocks of
'empty space' (ala the component labels).  The idea here is that the
meta-data partition could be placed at the beginning of a disk, and
would not interfere with any disklabel, or (in the case of hp300, for
example, the first cylinder of the disk).

4) The first block of the FS_METADATA partition contains a FS_METADATA
header, which is used to contain the meta-meta data.

   a) The first integer of the first entry of the header would contain
a version number, indicating the version of the spec used to describe
the remainder of the FS_METADATA header, and of the rest of the
FS_METADATA partition.  For the current implemenation, the version
will be '1'.

   b) A second integer (and perhaps more?) contains a 'magic number',
which is used to determine that the header is valid.  The value of
this 'magic number' is TBA.

5) Version 1 of the FS_METADATA scheme deals only with fixed slots.
Each slot is N blocks (negotiable), with the idea being that the
metadata for a given disk would fit within (say) 10MB.  Given that
RAIDframe component labels are currently less than 1 block in size
(512 bytes), each slot could be 1 block, and thus for a 16-partition
disk, the entire size of the FS_METADATA partition could be a mere
40.5K. (32K for the 'empty space', 0.5K for the 'header', and 8K for
the 16 slots.)

6) For now, slots can contain exactly the same information as stored
in the current RAIDframe component labels.

7) raidctl(8) will be used to initialize the FS_METADATA partition.
When the partition is initialized, the slots are zeroed, and the
header is written (in that order).  Care will need to be taken to
ensure that important data is not nuked, as necesary.

8) At boot, the kernel will search all disks looking for partitions of
type FS_METADATA (much like it does when it looks for FS_RAIDFRAME).
It will use the meta-data from the FS_METADATA partition to build the
auto-config structures in the same way that FS_RAIDFRAME component
labels are used to build the auto-config structures.

9) raidread_component_label() and raidwrite_component_label() would be
modified to deal with reading and writing the information contained in
each of the slots of FS_METADATA.  Extra information may need to be
stored in the RAIDframe driver to keep track of which slot a RAID
component is currently in (different components may be in different
slots on different drives).

10) Both this new method and the old methods of maintaining component
data will have to be maintained.

11) FS_METADATA will allow a regular FFS or LFS partition to be
incorporated into a RAID 1 set *without change*.  raidctl(8) will need
to support "merging" and "unmerging" of a FFS into or out of a RAID
set.  This merging/unmerging may require a reboot.  RAIDframe may need
to support the construction of empty RAID 1 sets in order to do this.

12) raidctl(8) will need to be able to support specifying whether the
new FS_METADATA should be used, or whether the old component labels
are to be used.

13) RAID types other than RAID 1 will be supported as well, but
merging existing FFS partitions into these sets will not be allowed.

14) Special care will have to be taken to ensure that components of
any non-RAID 1 sets do *NOT* begin at block 0 of the disks.  The
current component labels are designed to protect the disklabels, but
this new scheme will not protect disklabels unless FS_METADATA is the
first partition.  (i.e. a RAID 0 set which uses FS_METADATA )

15) Migration to RAID 1 sets from an existing disk is expected to go
something as follows: [snip]

16) Sysinst should be aware of the ability to install to at least a
RAID 1 set.  If nothing else, it should support the construction and
initialization of an (empty) FS_METADATA partition.

17) In an ideal world the FS_METADATA could be used to support full
Logical Volume Management stuff.  With versioning, this proposal
should be able to deal with that in the future.

18) This approach is not quite the same as that used by
Solaris/Tru64/HP-UX.  In particular, the FS_METADATA stored on a given
disk is only valid for that disk -- it doesn't contain meta-data for
other disks.  Replication of meta-data is something that may be
considered at a later date, but given the way the AutoConfiguration
stuff is designed to work (any disk, at any address, at any time), it
may not be as relevant.

Comments are welcome...

Later...

Greg Oster