Subject: Re: partition bookkeeping
To: Greywolf <firstname.lastname@example.org>
From: Bill Studenmund <email@example.com>
Date: 09/22/1999 19:23:58
On Wed, 22 Sep 1999, Greywolf wrote:
> On Wed, 22 Sep 1999, der Mouse wrote:
> # This is one of the arguments in favor of devfs (at least, those who
> # like c0t0s0 names might think so): it makes this possible. There's no
> # reason you couldn't have minors allocated only for those devices that
> # actually exist; as long as /dev/dsk/dks0d2s5 gets partition 5 on device
> # 2 on controller 0, nobody cares whether its minor number is anything in
> # particular.
> A devfs is completely extricate of the naming convention. Names are
> for the humanoid types.
> If we move all the disklabel out of the kernel, it is worth asking
> the question of whether or not the minor number will ever be used,
> since in the device driver, the minor number will show the disk number
> (say in all the bits above the fifth) and the partition to reference
> on that disk (on bits 0-4 [22 partitions overflows four bits).
> If the kernel has no disklabel/offset information, how does the
> driver get the bus/controller/unit/offset information?
I think the general idea was that the daemon (or routines in the kernel)
would have to fill in the partition info in the kernel before the
partitions could be used. I don't think the idea was to have the daemon
get called for each access to an uninitialized partition. :-)
> Let me guess, using fsck as an example:
> fsck gets "/dev/disk/rsd2d" as its device
> fsck calls open("/dev/disk/rsd2d") CS #1
> Eventually the vnode gets resolved, and
> the open routine for the vnode goes
> out to userland to talk to this daemon
> which handles all the device mappings CS #2
> The daemon runs and returns the information
> to the vopen (or whatever) CS #3
> The filehandle finally gets returned to
> fsck. CS #4
> Congratulations. We're now doing twice as many context switches
> as we really need to. Granted, how often will we be needing to
> do this (not too often, I'd surmise); nonetheless, will this be
> a trend to move things out of the kernel and into userland?
> Microkernels don't work -- context switching is not cheap.
> And what, pray tell, do we do if this daemon happens to get
> corrupted on the disk? We cannot fsck beyond the root
> filesystem in single-user mode. (Never mind that there is
> probably more wrong if the daemon is zorched.) We now
> have to restore from tape (we do have tape, right? Oh, my.
> No, we don't because we're doing dynamic device allocation which
> needs that daemon. Or is this just for disks?) just to
> run fsck on our filesystems.
That's a good reason to not do dynamic device allocation. :-)
But I don't think that was a real cornerstone of the wedges idea. I guess
I'll admit that I figured that the in-kernel loader would really need to
do more than just load the root partition, but not much. i.e. I figured
that the port-default in-kernel wedge initializer would do whatever the
disklabel reading code did now. To do otherwise would really just present
folks with too huge a shock, I think. :-) So we wouldn't go quite quite as
far up the poppycock creek as above. :-)
> I don't see that we really consume all that much space by
> mapping the partition information into kernel space:
> It's, what, a dev_t, a length and an offset plus some flags?
I agree. :-)
> I'm obviously missing something here because on the surface, such
> a dynamic scheme actually does look cool (as long as we don't go
> to the graveyard that System V made!). But how realistic is this?
I don't think you're missing anything. :-) I just think that the wedges
proposal wasn't about dynamic node generation, but about cleaning up how
the in-core partition -> offest,length info gets filled in. What we have
now was designed for BSD labels only, and has been extended for mbr
partitions, and then for non-*BSD/non-mbr labeling, without a redesign.
And it shows. :-) The wedge proposal was a way to clean it up, and support
the things folks want to do now (like recursive/extened MBR partitions).