Subject: Re: wedges vs. not-quite-wedges, was > 1T filesystems, disklabels, etc
To: Bill Studenmund <wrstuden@netbsd.org>
From: Greg Oster <oster@cs.usask.ca>
List: tech-kern
Date: 12/19/2002 19:04:29
Bill Studenmund writes:
> On Thu, 19 Dec 2002, Greg Oster wrote:
> 
> > Bill Studenmund writes:
> > > On Thu, 19 Dec 2002, Greg Oster wrote:
> > [snip]
> > Are there situations where:
> >  1) we need to boot from some native partition and
> >  2) that native partition won't/can't actually live in "NetBSD space"?
> >
> > If there arn't, then we probably don't need to allow the metadata to talk to
> > anything outside that space.
> 
> For that list of things, I think it's fine for us to reach outside the
> NetBSD space. I'm sorry, I was reacting more from the idea of reaching
> outside the NetBSD space to get at say the FreeBSD partitions. While I
> think we should get at them, they aren't part of an LVM system, and we
> shouldn't try and shoe-horn them into one.

I keep talking about a PV system, not a LVM system.. (i.e. Physical Volume 
system, not Logical Volume Managment).  So while yes, a FreeBSD 
partition might not fit into a LVM, it would (IMO) fit into a PV Management 
system.  

> > > Why do we need to get NetBSD-specific stuff out of the "native disklabel"
> ?
> >
> > Because NetBSD-specific stuff might not fit in native disklabels?  (e.g.
> > native disklabels may not have fields large enough to specify "large"
> > partitions?  Or disklabels might not be large enough to hold 256 partitions
> ?)
> 
> I don't think "not fitting" will be a concern, since "native" disklabels
> will have to update themselves to deal with large disks. :-)
> 
> As for 256 partitions, who in the world will ever use 64 of them?

Ok, so pick an arch that can't even support 64 in it's native label :)
(can you do 16 partitions in a native Sun disklabel?)

> > > I'd think it'd be easier to just teach all flavors of NetBSD about all
> > > disklabeling schemes we understand. 'cause there will be times when we
> > > care about non-NetBSD-specific stuff on non-native disks. Making a common
> > > library to understand other disklabel schemes is the only way to fix that
> ,
> > > and once we do that, we can also deal with NetBSD-specific stuff. :-)
> >
> > Except that we may still run into restrictions due to the legacy disklabels
> .
> 
> Yes, but there are other ways to fix it.
> 
> For one, we need a way to express the side of the NetBSD-specific stuff in
> the native label, so we have to use a label that can keep other OSs out of
> our stuff. :-)

And all we need for that is a marker that says "this chunk of the disk is used 
by NetBSD", and some well-defined way of pulling out whatever slice/wedge/
partition information that might be related to that chunk.

> > I guess I'm leaning towards a "let's make the new labelling scheme scalable
> > towards LVM-like metadata".  That is, let's break free from the old disklabel
> 
> Why? Why use LVM-like metadata? If we want an LVM, let's add an LVM!

If we restrict our new labelling scheme to not be able to include the 
meta-data that might be needed to identify a Physical Volume, then at some 
point when we do get a LVM, we're going to have to go "outside the label" 
to get that information.  All I'm suggesting is that we keep the idea of 
eventually getting an LVM in the back of our minds as we're designing this new 
thing, so that we don't do anything that will preclude us from easily adding 
an Physical Volume manager or a LVM in the future.

> Do we have any idea that wedges would be useful for an LVM? 

As Physical Volumes, why not?  The trick is just to keep the metadata related 
to that PV outside of the wedge (e.g. in a "label" :) )

> All the LVMs I
> am familiar with would take care of this internally. Maybe as an internal
> kernel subsystem, but not as a user-exposed thing.
> 
> > scheme, and come up with something that deals with not only single partitio
> ns
> > on obscure disks, but also with the metadata that might be required for RAI
> D
> > for LVM components.  We might only use it right now for talking about
> > "partitions", but at least the framework would be there for stuffing in
> > RAIDframe component labels and what-not at a later date.
> >
> > But perhaps that's too big of a step for now?
> 
> I think it's too big a step now because I have no confidence that these
> wedges would be at all usefull for an LVM system, and treating partitions
> as non-partitions looks like it will be fraught with potential problems.
> 
> I mean, come on. What RAID system wants to expose its individual
> components to userland? The completed RAID sets, yes, but not the
> components.

Exposed in what way?  For some things (e.g. a failed disk) the RAID system 
needs to be able to tell the user which disk has just died...  

I'm shooting for something that scales to LVM's for the simple reason that I 
don't want to see a whole bunch of work done on something, only to have to 
re-do/re-visit a bunch of this when we eventually need to deal with the 
meta-data for Physical Volumes.

In the meantime, I quite agree w/ Frank that we need to 1) write the 
partition reader library/tool and 2) do the daddr_t 32->64 bit work, 
and so if I'm stopping anyone from getting that done with this discussion, 
I'll quickly shutup :) 

Later...

Greg Oster