Subject: wedges vs. not-quite-wedges, was > 1T filesystems, disklabels, etc
To: Frank van der Linden <fvdl@wasabisystems.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/16/2002 12:17:54
On Sat, 14 Dec 2002, Bill Studenmund wrote:

> On Wed, 11 Dec 2002, Frank van der Linden wrote:
>
> > 	   Solution:
> > 		implement wedges. It would be a waste to go for a
> > 	   	disklabel with 64-bit fields, since people agree that disklabels
> > 	   	should be phased out anyway.

I've spoken with Frank some about this, and I wanted to continue the
conversation here.

What I proposed I'm going to call diskmap for now; it's easier to talk
about with a name. :-)

Diskmap has (I think) a lot of the features of wedges, but avoids the
problems I think wedges will run into.

Similarities:

1) The kernel will only know enough about partitioing layouts to boot.
Everything else will be left to a userland daemon. Said daemon will be MI,
and will thus permit all ports to know about all partitioing schemes.

2) The kernel will go out of the business of writing disklabels.

3) We will have access to all the partitions on a disk, even those of
other OSs. You won't have to add partitions to a NetBSD disklabel to get
at the other partitions.

Big difference:

wedges are just extents on a disk. They are thrown in one big pool. dk4
(using the device name Jason suggested) could be on sd1, while dk5 could
be in wd0. The minor number space just numbers wedges. diskpart, what I'm
proposing, is (in kernel) more like disklabels now; the dk (or sd/wd/
xd/fd) minor number space is broken up into units and partitions-in-unit.

Think of diskpart as a bundle of 64 wedges tied to a specific disk. If 64
is too few, we can go with 256. Fact let's just go with 256.

In return for limiting the number of partitions on a disk we can deal
with (to 255 user-specified ones; the whole-disk is still the whole-disk),
diskpart avoids a number of the problems I see with the, "pool of
extents," aspect of wedges. I'll try to outline a few of them.

1) We need a way to have a reproducable mapping of userlevel name and
permissions to a device (partition). For now, /dev maps names and
permissions to device numbers, so that means we have to have stable device
numbers. While it might be ok for changing Linux or Windows partitions to
move around a few of the Windows or Linux partitions, changing partitions
on wd0 should NOT move things around on sd1 or sd3.

diskmap does this by a) segregating all of a disk's partitions to be in
the same minor number area (unit/partition split), and b) for things that
have a structure (NetBSD or FreeBSD disklabels, the 4 slots of an MBR
partition) allocating space for as many entries as there can be. Note that
changing an mbr partition into an extended partition will move things
around for diskmap, but that'll be at the end of the table, and diskmap
WON'T change anything for any other disk.

Wedges can do this too. We keep a userland config file (or files) listing
dk entries, and use it to load the extents at boot. When we find new
partitions, we add them to the end of the list. When we notice a partition
has been deleted, we mark a wedge as unused. Sure is a simple description.

However, we just made that config file a historical document. If you
either have added partitions over time, or you have added disks in an
order other than the standard search one, you can't regenerate the file.
You lose it, and you lose all your disk partitions.

As far as I can tell, we don't have any such historical config files right
now. We certainly don't have ones in key places like partition -> disk
mapping. I and a number of folks have bad-mouthed AIX's dependency on its
config database, and yet here we are talking about adding one now.

I think this is a bad idea both because I think it's ugly, and also it
will be a big change with how we've historically done things. Are we
really ready to inflict that on our admins?

2) The userland daemon is more complicated for wedges. Mainly to take care
of the above issue. Both approaches will need a daemon that will scan
disks and understand the different partition methods. diskpart will then
just read them into a disk-specific table, itterate over all the types,
and shove it into the kernel. The wedges userland daemon has to do all
that work, but it also has to find where the partitions are in the wedge
list. It has to line existing wedges up with the partitions it finds so
that it can notice added and deleted partitions. Yes, this problem isn't
intractable, it's just more complexity. It's more to write, and more to
maintain.

It also has to deal with drives not being on-line. I'd expect that if a
drive isn't on-line, the wedges for that drive are loaded as not-in-use.
However they are conceptially different from deleted ones in that we
should leave them alone; we can't just re-use them.

3) How does an admin find the wedge/partition that corresponds to a given
partition?  i.e. Last time I booted FreeBSD I added a partition. Now in
NetBSD, which wedge is it? For diskpart, since we allocated a block of
partitions for the FreeBSD disklabel, it's in with the other FreeBSD ones.
And in running "disklabel" on diskpart, I envision you'll be able to see
the block of partitions that come from FreeBSD as FreeBSD ones, so it's
real easy to tell.

For wedges, the userland daemon either re-used a previously-deleted wedge,
or added one on at the end. The up-shot is that the wedge may be no where
near the other wedges for FreeBSD on the same volume. While it is
something we could educate folks about, the education curve will be rather
steep.

4) They're partitions, not logical volumes. Ok, this one is a more
philosophical, but it represents an idea that will underlie design
decisions.

When I think about logical volumes, I think about AIX's LVM. It's the same
one that moved into OSF and HP-UX (it was one of IBM's contributions to
the OSF effort). In that case, it makes a lot of sense to treat a
partition/logical disk as just an extent on a disk. Or for IBM's LVM, a
couple of extents on a disk.

The main thing is that all of the tools for an LVM hide allocation
specifics from the admin. You say you want an LV with this many blocks
(well 4 MB chunks for AIX's LVM) in this area of the disk.

Wedges look a lot like LVs, but you still edit/control them with partition
editors. You'll use pdisk or an equivalent to modify partitions on an
Apple-partitioned disk. You'd use an mbr editor for mbr partitions. You'd
use a disklabel-like editor for NetBSD disklabels or NetBSD partitions of
an MBR slot. On FreeBSD, you'd use their editing tools to edit
partitions/slices. On OpenBSD, you'd use their disklabel code.

While we can make editing tools to try and hide the partition-ness of
NetBSD wedges from admins, other operating systems won't. Both for their
own partitions, and their view of our partitions.

So as long as our disks are divvied up by partitioning schemes, we should
just accept the fact we have partitions. Pretending otherwise just seems
like a recepie for a mess.

Take care,

Bill