Subject: Re: Moving getmaxpartitions to libc
To: Leo Weppelman <leo@wau.mis.ah.nl>
From: Chris G. Demetriou <cgd@netbsd.org>
List: tech-userlevel
Date: 08/25/1999 01:26:37
Leo Weppelman <leo@wau.mis.ah.nl> writes:
> > for the most part, even in the kernel, you want to use the "large # of
> > partitions" disklabel.  This is true at least because of the existence
> > of (programs like) "mbrlabel" which manipulate the in-core label
> > without the expectation that it'll ever be written back to disk.  For
> > them, it can be valuable to provide more partitions throughout the MI
> > code.
> 
> Definitely! I don't know "mbrlabel", but I can imagine that vnd-devices with
> lots of partitions might come in handy... But since the vnd-driver also
> uses dk_label (== struct disklabel), it is still limited by MAXPARTITIONS
> unless a major kernel overhaul is performed.

Right.  as always, there are two disklabels to be concerned about: the
in-core label and the on-disk label.  the in-core label can be
arbitrarily large, and disk drivers should support this.

When setting the in-core label (DIOCSDINFO), the arbitrarily large
label should be allowed.

When setting the on-disk label (DIOCWDINFO; also sets the in-core
label), only valid disklabels for the disk can be allowed -- for
whatever that means.  8-)

seems to me that disklabel(8) probably needs an option to set just the
in-core label (and mbrlabel(8) or similar programs might want to
generate disklabel(8) compatible output, so you could say something like
	mbrlabel sd0 > file
	disklabel -R -S sd0 file
(where -S just sets the in-core label), or similar cmds.


> > There's the minor hitch that not all of those partitions (the ones >
> > MAXPARTITIONS) will be accessible on ports with small maxpartitions,
> > but I think that's OK.  (At least people can know what they're
> > missing.  8-)
> > 
> > The important thing to do, in my opinion, is decouple the on-disk
> > format from the MI-kernel/userland format.  You have to do this
> > anyway, because the MI kernel and userland needs to be able to
> > represent many formats other than the native 'struct disklabel'
> > format.  (MBRs, old sun disklabel, Mac and other partition tables,
> > etc.)
> 
> I see what you want, but I think it is a step too far - at least for me it
> is. It also has some semantic side effects. I mean, kern.maxpartitions
> becomes useless. Does it represent:
>     a) the number of partitions on any (pseudo)disk (== current meaning)
>     b) the number of partitions on disks shared with the native
>        OS
>     c) ???

or c) the number of partitions on disks with a non-native partitioning
format known to this port's kernel, or d ... or e ...

Yes, you're absolutely right, trying to do what i was suggesting is
bogus.


> It is useful to carefully consider what the userland disklabel definition is
> going to look like, so the userland transition on the ioctl-interface has
> only to be done once. To support other disklabel formats, the best thing
> todo seems to reserve a certain amount of slack in the structure and use
> unions in the definition. So label formats can be squeezed in.

... or your pointer/size/type suggestion.

i think i agree with this analysis, but the decision of what how to
represent the data is probably not going to be an easy problem.

to add to the problem, you don't necessarily want general-purpose
utilities (e.g. disklabel) to have to understand every random
platform-specific representation, if you can avoid it.  I think you'd
really like them to be able to talk in generic terms even about
'weird' disklabel formats...

in order to cope with this, in the kernel you'd need to do something
like:

op			actions
read on-disk label	* read on-disk label into in-core 'on-disk-like'
			("native") representation of label.

			* translate native representation of label into
			generic representation, which is used for most
			in-kernel calculations.

			* save both.  when requested by user, provide
			in either native or generic representation as
			requested. 
		
set in-core label	* if provided in generic rep., translate to
			native rep. if possible.  (_do not_ fail if
			impossible.)

			* if provided in native rep., translate to
			generic rep.  (should not fail... if it does,
			generic isn't generic enough.)

			* save both, provide to user as requested.

write on-disk		* if provided in generic rep., translate to
			native rep. if possible.  (fail if impossible.)

			* if provided in native rep., translate to
			generic rep.  ...

			* save both, ...

I'm not sure of the feasibility of doing the generic -> native
translation in the kernel, rather than embedding that knowledge into
user-land programs.  If you try to do it in the kernel, you need to
figure out what do to about partition IDs, etc., which may not easily
fit into a generic representation that's useful for most kernel
purposes.


for the record, i don't think that a mechanism controlled/configured
entirely, or even mostly, in user-land is likely to work.  we _need_
native->generic translation in the kernel so that, for instance, we
can find root partitions (etc.) when booted off of a disk with a
native label that's not quite what NetBSD normally would like from a
disklabel.  "not quite... disklabel" can mean anything from MBR to
slightly tweaked BSD-ish disklabel (e.g. sun3), to amiga rdb or mac
partition table...

While the ability to work around deficiencies in kernel
native->generic translation in user-land (via programs like mbrlabel)
is very important, if the need for programs like mbrlabel is anything
other than transitory for common label formats, we're probably doing
something wrong.  The 'bugs' section of mrblabel points out why you
want this done in the kernel: so that the first open of the disk gets
a proper faked-up generic label.  (I would argue that any attempt to
"fix" disk drivers to remember in-core labels when all partitions are
closed is bogus... you then need to perform special hacks where you
didn't previously need to, e.g. in disklabel -r or dd (if clobbering
the old label area 8-) to get the old in-core label forgotten if it's
no longer desirable... violates the principle of least surprise.)


cgd
-- 
Chris Demetriou - cgd@netbsd.org - http://www.netbsd.org/People/Pages/cgd.html
Disclaimer: Not speaking for NetBSD, just expressing my own opinion.