Subject: Re: Partition tables (was: Re: Another changer, another changer problem)
To: Bill Studenmund <skippy@macro.stanford.edu>
From: David Holland <dholland@cs.toronto.edu>
List: current-users
Date: 10/08/1998 16:02:00
 > > Yeah. The vague plan I had was to first build the new services,
 > > layering them on top of the raw partition provided by the existing
 > > drivers, then, once that was working, starting to rip cruft out of the
 > > drivers themselves.
 > > 
 > > The chief design question, I think, is how to do minor/major number
 > > handling and how this interacts with bdevsw and device attachment. I'm
 > > not entirely convinced it'll all work and keep backwards compatibility
 > > with installed /devs without some ugly hacks.
 > 
 > It should. :-)

I remain unconvinced; read on...

 > > If you keep the same device numbering, you get multiple entries for
 > > the partition driver in bdevsw and horrors in softc handling inside
 > > the driver. 
 > 
 > Same device numbering?

All the disk devices keeping the same names and major/minor numbers
that they presently do, possibly adding more minor numbers. The
device-independent disklabels remain hidden under the covers.

 > > On the other hand, if you use a new major number for the partition
 > > driver, then you limit yourself to 256 partitions on the system until
 > > dev_t becomes 32 bits wide. (Or is it now?) And you have horrors
 > > trying to figure out which partitions belong to which disks.
 > 
 > dev_t has been 320bits wide for years. The major and minor routines were
 > taught how to use all 32-bits about 6 months ago.

ah, ok.

 > I think the cleanest thing to do is to just make the unit and part macros
 > use a different seperation for major units over 255, have two entries in
 > b/cdevsw, and have a canonicalization routine in the driver or in specfs
 > (take a given device number and turn it into the one for the
 > higher-numbered device - oops, either this happens in the driver, or we
 > add info to b/cdevsw. Easy either way).

Yes, this is straightforward. But either you use separate major
numbers for different types of disk (sd0, wd0, whatnot), in which case
you're really using the previous technique, with its problems, or you
use one major number for all disk partitions.

This would gives you the somewhat undesirable situation that you can't
necessarily tell easily what partitions are on what disks, and worse,
altering the order in which disks attach alters the order of the
partitions. Under this scheme, you'd have partition device nodes in
/dev, something like "[r]part0 [r]part1 ... [r]partN", and the mapping
between these and disks would be either wired down or done on the fly,
like the way sd0 gets mapped to a particular SCSI id on a particular
controller, something people were just complaining about.

For this reason I don't think this is a good option.

 > > A third option is to leave the numbers alone and have the disk drivers
 > > call out into a partition table library of some kind, but then you get
 > > probe/attach horrors.
 > 
 > ?? Why do you get probe horrors?

Because you end up reimplementing special cases of all the generic
probe and attach code.

 > > You can also treat the partition stuff as a (simple) filesystem
 > > instead of a device driver; in some ways I think this is the best
 > > approach... but it causes some real horrors with compatibility.
 > 
 > ??

A filesystem is a collection of on-disk structure used to split a
storage system into smaller units (files). This is also what a
partition table does. Think of the partition table as the combination
superblock and root directory of a very primitive filesystem.

You can implement all the disklabel handling as a filesystem; then you
end up with device nodes for just the raw disks (wd0, sd0, sd1, etc.)
and corresponding directories (wd0p, sd0p, sd1p, etc., where p is for
partitions) - then you do "mount -tlabel /dev/wd0 /dev/wd0p" to mount
an instance of the disklabel fs on the mount point wd0p, using the
block device wd0. Then /dev/wd0p would contain the files 'a b c d e ra
rb rc rd re' or whatever other canonical representation of the
partitions present you want. And it could also support altering the
disklabel on the fly in any of several ways depending on how much
kernel bloat was tolerable.

You could have one labelfs that optionally understood assorted
disklabel types, or several, depending on how good autoprobe code
someone wanted to write. You can also pass hint information in the
mount options field.

Myself, I think this is the cleanest approach, but I suspect it's too
radical for netbsd.

-- 
   - David A. Holland             | (please continue to send non-list mail to
     dholland@cs.utoronto.ca      | dholland@hcs.harvard.edu. yes, I moved.)