Subject: Re: wedges vs. not-quite-wedges, was > 1T filesystems, disklabels,
To: Greg Oster <oster@cs.usask.ca>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/19/2002 15:53:10
On Thu, 19 Dec 2002, Greg Oster wrote:

> Bill Studenmund writes:
> > On Thu, 19 Dec 2002, Greg Oster wrote:
> [snip]
> >
> > Two questions. 1) I'd rather expect we'd really mark it, "NetBSD LVM".
> > :-)
>
> "ok" :)
>
> > 2) Why do we need a partition?
>
> Maybe we don't.. but what we will need is some way to figure out where the
> "NetBSD Metadata" lives.  Maybe we could make assumptions about where such
> metadata might live, but a partition would seem to be the simplest way.

I guess what I had in mind is why use the "native" disklabel system. The
LVMs I know of had, as best I can tell, their own systems.

> > What do say Veritas or IBM's LVMs do?
>
> I have no idea.

I'd suggest we do something like that.

Actually, it would be VERY NICE to have the same disk layout as one or the
other LVM. :-)

I understand there is an IBM-ish LVM for Linux. :-)

> > I think it is VERY DANGEROUS for us to refer to spaces outside of the
> > "NetBSD" space. While we do it now in disklabels, as we migrate to more
> > indirect allocation methods, I think it will be harder to keep things in
> > sync.
>
> mmmm... rope ;)

But rope to what end?

> Are there situations where:
>  1) we need to boot from some native partition and
>  2) that native partition won't/can't actually live in "NetBSD space"?
>
> If there arn't, then we probably don't need to allow the metadata to talk to
> anything outside that space.

For that list of things, I think it's fine for us to reach outside the
NetBSD space. I'm sorry, I was reacting more from the idea of reaching
outside the NetBSD space to get at say the FreeBSD partitions. While I
think we should get at them, they aren't part of an LVM system, and we
shouldn't try and shoe-horn them into one.

> > Why do we need to get NetBSD-specific stuff out of the "native disklabel"?
>
> Because NetBSD-specific stuff might not fit in native disklabels?  (e.g.
> native disklabels may not have fields large enough to specify "large"
> partitions?  Or disklabels might not be large enough to hold 256 partitions?)

I don't think "not fitting" will be a concern, since "native" disklabels
will have to update themselves to deal with large disks. :-)

As for 256 partitions, who in the world will ever use 64 of them?

> We also don't have to worry about storing NetBSD bits one way in
> one native label, and a different way in a different native label.  All we
> need to worry about is how to find this "NetBSD chunk" (or chunks..)

Unfortunately we have to keep supporting the different formats.

> > I'd think it'd be easier to just teach all flavors of NetBSD about all
> > disklabeling schemes we understand. 'cause there will be times when we
> > care about non-NetBSD-specific stuff on non-native disks. Making a common
> > library to understand other disklabel schemes is the only way to fix that,
> > and once we do that, we can also deal with NetBSD-specific stuff. :-)
>
> Except that we may still run into restrictions due to the legacy disklabels.

Yes, but there are other ways to fix it.

For one, we need a way to express the side of the NetBSD-specific stuff in
the native label, so we have to use a label that can keep other OSs out of
our stuff. :-)

> I guess I'm leaning towards a "let's make the new labelling scheme scalable
> towards LVM-like metadata".  That is, let's break free from the old disklabel

Why? Why use LVM-like metadata? If we want an LVM, let's add an LVM!

Do we have any idea that wedges would be useful for an LVM? All the LVMs I
am familiar with would take care of this internally. Maybe as an internal
kernel subsystem, but not as a user-exposed thing.

> scheme, and come up with something that deals with not only single partitions
> on obscure disks, but also with the metadata that might be required for RAID
> for LVM components.  We might only use it right now for talking about
> "partitions", but at least the framework would be there for stuffing in
> RAIDframe component labels and what-not at a later date.
>
> But perhaps that's too big of a step for now?

I think it's too big a step now because I have no confidence that these
wedges would be at all usefull for an LVM system, and treating partitions
as non-partitions looks like it will be fraught with potential problems.

I mean, come on. What RAID system wants to expose its individual
components to userland? The completed RAID sets, yes, but not the
components.

Take care,

Bill