Subject: Re: Thoughts about wedges
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 09/27/1999 11:38:03
On Sun, 26 Sep 1999, der Mouse wrote:

Could you please keep some of the other authors EMail addresses? :-)

> >> which is exactly the opposite of one of the main points of the
> >> wedges proposal.
> 
> > This point of the initial proposal is the most controvertial one, I
> > think.  [...]  Why do we want to move the partitioning out of the
> > kernel?  What do we gain?
> 
> You see clearly.  I'd not thought about it in quite that way.

Also, if we ask ourselves what we really want,  we're more likely to get
what's important in the final result. :-)

> Other benefits of wedges - such as recursive partitioning - are also
> either things we have now (recursive partitioning, for example, which
> can be done with single-member ccds) or things that are orthogonal to
> moving the partitioning knowledge to userland (scads-o'-partitions
> and/or dense minor numbers, depending on which wedge scheme is chosen).

My thought here (with the unit/partition minor scheme) was that there'd be
very few schemes which specify all 64 partitions in one go. So if we had
routines to expand an MBR partition into Blah, we could call it repeatedly
to get the different partitions.

> > About keeping the disklabel interface: I'm assuming that we're going
> > to keep the unit/partition split to device minor numbers (see my post
> > yesterday about the problem with just throwing wedges into their own
> > major # with minor numbers dictated by allocation history).
> 
> We may have to do this.  I'm not entirely convinced it's not possible
> to get persistent filesystem device attributes (chmod/chown in
> particular) and still get dense minor numbers.

I agree that it's possible to get persistent device attributes with dense
numbers. My big concern is how long it'll take to get it. :-)

> What follows is thoughts and questions, not an attempt to attack an
> established position.  I'm trying to find a scheme which gives both
> "sides" what they want: both device node assignments stable enough to
> preserve chown/chmod info and dense minor numbers.
> 
> Across what exactly do you want attributes to remain stable?  Provided
> wedgeconfig behaves predictably, dense minors with a single major are
> fine provided nothing changes in the disk layout.  Add or remove a
> partition, other than such that it will be found at the end of
> wedgeconfig's sequence, and you lose.

I agree that whatever reads the diskalbes into partitions needs to be
consistent across boots, otherwise you won't be able to use static nodes
at all. :-) My concern is that you also have to take special care that
all disks are found at boot.

> But you already have something very much like that unless you go to the
> trouble to wire down disks in your config file; add or remove a disk -
> except at the end of the probe sequence - and the filesystem node
> versus partition mapping changes.

But say I've wired down all of my disks. Say I've wired down ones on a
fibre channel fabric (using their World-Wide-Name if I remember the term
right), but for whatever reason not all of the drives were visable at
initial autocinfig. Or I have a zip drive not at the end of the sd list,
with its media out. As wedgeconfig comes along and finds drives, the
number of partitions present on these wired down devices will change. So
the minor numbers of all devices after them will change too. :-)

Granted we could keep a database of wedge info, but now we're getting into
a much more complicated device configuration. AIX can do it with its LVM,
and we could too. We'll have to for LVM support. It (the databases needed
to keep minor #'s consisten across boots) is just it's an additional
hurdle for us to cross. :-) 

> Admittedly, adding or removing drives is probably rarer than adding or
> removing partitions.  But is it really that much rarer?

In the scenario above, not partitions were added or subtracted from media,
nor the number of devices present changed. All that happened was that
either removable media were ejected, or wired-down drives on some sort of
hot-pluggable bus weren't there when we initially scanned. I think that's
much more likely. :-)

> Another possibility was suggested by the person who said something
> along the lines of
> 
> 	wd0 at wdc0 ...
> 	mbr0 at wd0
> 	msdosfs0 at mbr0 partition 0
> 	bsdlabel0 at mbr0 partition 2
> 	ffs0 at bsdlabel0 partition 0
> 	ffs1 at bsdlabel0 partition 4
> 	sunlabel0 at bsdlabel0 partition 5
> 	ffs3 at sunlabel0 partition 0
> 	swap0 at sunlabel0 partition 1
> 
> Given the underlying machinery to create/destroy this sort of thing
> dynamically, wedgeconfig could turn into something that would build
> this sort of device tree automatically.  As proposed, it has problems,
> but it could perhaps mutate into something workable.  As a first cut,
> I'd consider making it more like
> 
> 	wd0 at wdc0 ...
> 	mbr0 at wd0
> 	partition0 at mbr0 partition 0 type 0x0b
> 	bsdlabel0 at mbr0 partition 2 type 0xa9
> 	partition1 at bsdlabel0 partition 0 type ffs
> 	partition2 at bsdlabel0 partition 4 type ffs
> 	partition3 at bsdlabel0 partition 5 type ? # type not checked
> 	sunlabel0 at partition3 # disk image in this partition
> 	partition4 at sunlabel0 partition 0 type ffs
> 	partition5 at sunlabel0 partition 1 type swap

We could do that. :-) But the whole point of wedges was to get partition
info out of the kernel. :-) Now we're having to put it in the kernel
config files. :-(

I guess my big concern with wedges is that we're probably close enough
with Leo's proposal (if he has patches, which I think he said) to getting
more than 8/16 partitioning into the kernel in time for 1.5. I think a
complete implimentation of wedges (seperate major #) would need longer to
get right. So we're stuck with 8 partitions on most platforms because
we're shooting for a very tough (to get right, but that's the only way we
would do it!) goal. :-) 

Take care,

Bill