Subject: Re: Partition tables (was: Re: Another changer, another changer
To: None <greywolf@starwolf.starwolf.com>
From: David Holland <dholland@cs.toronto.edu>
List: current-users
Date: 10/19/1998 17:15:46
 >  * Wait, which problem are you trying to solve? There are at least the
 >  * following related issues involved here:
 >  * 
 >  *   1. getting disklabel handling code out of disk device drivers
 >  *   2. supporting multiple types of disklabel/partition table
 >  *   3. probing a disk for disklabel(s)/partition table(s)
 >  *   4. organizing/numbering the partitions found for presentation
 >  *      to higher layers of the system
 >  *   5. mapping major and minor device numbers to disks and partitions
 >  *      and/or presenting partition names in /dev
 >  *   6. mapping disk names to individual pieces of hardware
 >  *      (this is really mostly the other thread though)
 >  *   7. providing better support for editing disklabels.
 > 
 > 8.  Not making the disklabel access bound to a physical partition.

That's 7. Anyway, both of the solutions on the table right now provide
for the /dev entries for partitions being distinct from the /dev
entries for disks. So you don't need to waste a partition slot on "the
whole disk". You do need a /dev entry, but that doesn't have to take
up space in the partition table, which is the only place the space
matters much.

 > The basic problem with this whole thing is the mixing of UN*X/DOS
 > partitioning with which we regrettably must deal.

Well, yes.

 > Get rid of the slash and zero fill the disk number?  I don't know,
 > there's got to be a more elegant way to handle this.  I don't think I'd
 > be averse to /dev/wd003a as a device name (wd0, partition 3, 
 > subpartition a).
 > 
 > There's got to be a limit here, though; doing fdisk partitions three
 > times over, what's the point?  Twice makes some sense.

Well, it happens - if you make sixteen "logical" partitions with DOS,
you get sixteen layers of nesting, because DOS is stupid.

 >  * Hmm. Maybe the right thing to do is to collect all the partitions of
 >  * each type together. Then you know how many partitions per table you
 >  * can have, so you can assign minor numbers in some sensible manner.
 >  * Then the wd0 drawn above would give you (let's assume fdisk is the
 >  * fdisk table device (major 101), dk is the bsd disklabel device (major
 >  * 100)):
 > 
 > I hope you're not suggesting hard-wiring these -- the config file would
 > get huger than it is (for the i386 port -- the SPARC port is pretty
 > reasonable)!

Not unless you wanted to - it should be safe to just say

fdisk* on wd*
dk* on wd*
dk* on fdisk*

for the i386 port.

 > I see the following as criteria for NetBSD:
 > 
 > 	- we need to be able to continue referring to a disk partition as
 > 	  {,r}${type}d${unit}${part}; 

I don't think this can be made to work with generic disklabels without
a lot of internal upheaval in the kernel.

I'm not exactly interested in /dev/dsk/c0t0d0s0p0q0z0f0x0 either. This
was the genesis of suggesting naming the partitions fdisk0, fdisk1,
fdisk2, etc.

 > 	- we need to find another way of actually accessing the disklabel
 > 	  in order to avoid wasting a partition for this purpose.

The easy solution is to add an ioctl that works on any partition of a
disk. The trouble is that you run into horrible problems with fdisk
tables on the x86. This is why I've been looking for a good solution
rather than a minimal one.

 > 	- we need to have a partition table larger than 8 partitions
 > 	  due to the larger disks that are available. I would opt for
 > 	  16 because exact powers of two seem to fit much better into
 > 	  things which are ostensibly bit-field oriented in the first place.

16 isn't enough. You can get 50-60G single disks now. If you make even
vaguely reasonable size partitions on them, you get 20-30.

You will never make a single disklabel large enough to handle the
number of partitions someone might want to make. Suppose someone wants
to split a 1TB RAID into 2G partitions because he's got some
filesystem that can't handle sizes more than 2^31 bytes? I know this
is stupid, but things like this happen. That's 512 partitions. If you
make the disklabel large enough for that, the people trying to run on
100M disks will scream bloody murder.

Some kind of nesting or chaining scheme is required in the long run,
so any long-term solution has to be able to support this.

 > 	  It avoids waste in the numbering space, and it avoids potentially
 > 	  expen$ive computations (straw man?  I don't know, but i think
 > 	  it's less expen$ive to figure a division by sixteen than by
 > 	  N + something - something else...)

This is pointless. You shouldn't need to do computations like this
except during attach, and that happens once and you have plenty of
cycles.

 > [This may get a bit tangential...]
 > 
 > We're trying to at least keep the device (and LUN?) numbering wired to
 > the physical device, or so I thought.

This can't really be done in general.

-- 
   - David A. Holland             | (please continue to send non-list mail to
     dholland@cs.utoronto.ca      | dholland@hcs.harvard.edu. yes, I moved.)