Subject: Re: Partition tables (was: Re: Another changer, another changer
To: Gandhi woulda smacked you <greywolf@starwolf.com>
From: David Holland <dholland@cs.toronto.edu>
List: current-users
Date: 10/16/1998 21:48:56
Note to those following along: this message rambles for a while and
actually has something approaching a proposal at the bottom, so you
don't necessarily want to skip over it.

 > Idea for the partition table thing:
 >  [...]
 >
 > We could actually have a single device to which you send requests,
 > i.e. /dev/diskpart, and you do something like
 > 
 > struct diskaccess da[1];	/* contains a struct diskpart */
 > 
 > da->disk="sd0";
 > 
 > fd=open("/dev/diskpart", O_RDWR)
 > ioctl(fd, DKIOCGDTAB, &da);
 > 
 > and the diskpart driver would automagically handle the routing.

Wait, which problem are you trying to solve? There are at least the
following related issues involved here:

  1. getting disklabel handling code out of disk device drivers
  2. supporting multiple types of disklabel/partition table
  3. probing a disk for disklabel(s)/partition table(s)
  4. organizing/numbering the partitions found for presentation
     to higher layers of the system
  5. mapping major and minor device numbers to disks and partitions
     and/or presenting partition names in /dev
  6. mapping disk names to individual pieces of hardware
     (this is really mostly the other thread though)
  7. providing better support for editing disklabels.

As previously noted, (1) by itself isn't really very hard. Given a
good solution for (1), (2) isn't all that hard either. In a sense, (4)
and (5) are the same problem: a complete solution to (5) also solves
(4), although it's possible to solve (4) without solving (5). (6) is
only tangentially related, except some of the same problems apply when
translating partition names to physical partitions. 

I think you're worrying about (7) though, and I think that's not even
really a problem - in all of the solutions proposed so far there's
either a whole-disk "partition" like current practice, an independent
device node for the whole disk, or a special "partition" that holds
just the partition table/disklabel.

The basic problem in (5) is that you have hardware organized something
like this:

         wd0
          |
          |-fdisk table
             |--partition1                  (#1)
	     |--partition2                  (#2)
             |     |
             |     \-fdisk table
             |        |--partition1         (#3)
             |
	     |--partition3                  (#4)
                   |
                   \-bsd disklabel
                        |--partition a      (#5)
                        |--partition b      (#6)
                        |--partition c      (#7)
                        |--partition d      (#8)
                        |--partition e      (#9)

...and a similar mess on each of wd1, sd0, sd1, sd2, etc.

How do you name these? There's some precedent for how one should
handle the naming of nested fdisk tables, but not much, and it's not
clear what you're supposed to do if there's more than one tree of
nested fdisk tables. There's not really any precedent at all for
naming the stuff listed in the bsd disklabel *in its full context*.
And what do you do if you find a Mac disklabel nested in there
someplace?

There are two obvious solutions: one is to choose an order of tree
descent and number the partitions in that order. The other is to use
hierarchical naming, that is, one number at each level, so you'd have
something like wd0/1, wd0/2/1, wd0/3/[a-e], etc.

The problem with numbering in order is that splitting a partition
someplace renames all the ones "after" it. The problem with
hierarchical naming is that backwards compatibility for x86 disks with
nested bsd disklabels becomes difficult: identifying what partition
used to be, say, wd0c, so you can make a symlink, is a pain. And also,
hierarchical naming of this sort isn't very compatible with major and
minor device numbering.

Then when you start looking at multiple disks it becomes an even
bigger nuisance. If you have the partition stuff set up as a driver,
you don't really want multiple major numbers for it. In fact, what you
really want is for the major number to correspond to the partition
table type, because that's the proper way to choose which driver to
invoke.

Hmm. Maybe the right thing to do is to collect all the partitions of
each type together. Then you know how many partitions per table you
can have, so you can assign minor numbers in some sensible manner.
Then the wd0 drawn above would give you (let's assume fdisk is the
fdisk table device (major 101), dk is the bsd disklabel device (major
100)):

	fdisk0a (101, 0) 1st fdisk table, 1st slot     [marked #1 above]
	fdisk0b (101, 1) 1st fdisk table, 2nd slot     [marked #2 above] (*)
	fdisk0c (101, 2) 1st fdisk table, 3rd slot     [marked #4 above] (*)
	fdisk0d (101, 3) 1st fdisk table, 4th slot (ENXIO)
	fdisk1a (101, 4) 2nd fdisk table, 1st slot     [marked #3 above]
	fdisk1b (101, 5) 2nd fdisk table, 2nd slot (ENXIO)
	fdisk1c (101, 6) 2nd fdisk table, 3nd slot (ENXIO)
	fdisk1d (101, 7) 2nd fdisk table, 4th slot (ENXIO)
	fdisk2a (101, 8) 3rd fdisk table, 1st slot   (perhaps on another disk)
	  :
	dk0a    (100, 0) 1st bsd label, a partition    [marked #5 above]
	dk0b    (100, 1) 1st bsd label, b partition    [marked #6 above]
	dk0c    (100, 2) 1st bsd label, c partition    [marked #7 above]
	dk0d    (100, 3) 1st bsd label, d partition    [marked #8 above]
	dk0e    (100, 4) 1st bsd label, e partition    [marked #9 above]
	dk0f    (100, 5) 1st bsd label, f partition (ENXIO)
	dk0g    (100, 6) 1st bsd label, g partition (ENXIO)
	dk0h    (100, 7) 1st bsd label, h partition (ENXIO)
        dk1a    (100, 8) 2nd bsd label, a partition   (perhaps on another disk)
          :
        wd0     (something) 1st whole ide disk               (*)
        wd1     (something) 2nd whole ide disk               (*)
        sd0     (something) 1st whole scsi disk              (*)
          :

(*) These devices would return EBUSY on open because they're in use by
other instances of disklabel drivers, unless none of the partitions
were open. Or maybe not - it depends if you want a minimal foot-guard
on your gun or not. :-)


This is actually starting to look viable. Do people care if you can't
tell by looking at the device name which disk it's on? In a sense it's
not different from not being able to tell what physical disk sd0 is by
looking at the device name - you have to look at the device probe
output.

I suspect in the long run the only correct way to identify a disk
volume is by some kind of volume name or serial number actually stored
on the disk. That's in a sense a separate issue though.

-- 
   - David A. Holland             | (please continue to send non-list mail to
     dholland@cs.utoronto.ca      | dholland@hcs.harvard.edu. yes, I moved.)