Subject: 32 bit dev_t proposal
To: None <tech-kern@NetBSD.ORG>
From: Todd Vierling <tv@NetBSD.ORG>
List: tech-kern
Date: 01/10/1998 17:29:59
One of the suggested projects when working towards 1.4 is upgrading our 16
bit dev_t to a 32 bit dev_t.  For whatever psychotic reason, I took it upon
myself to start the ball rolling.  It seemed easy at first, and then it
ballooned into the thread of the month.  :)

With a little bit of chatter elsewhere, I was told this would be best on
tech-kern.  So here's a method to bring us 32 bit dev_t's with binary
compatibility that needs advice, collimated into one message.  Suggestions
gladly appreciated!

This may feel painful, but this is something that many of people have said
was necessary for a long time, and the compatibility pain is IMHO worth the
gain.  I will be posting on here a pointer to a context diff against 1.3A
(-current, when it switches back from 1.3 release) sometime tomorrow or
Monday based on the working API as described below.

=====

1. dev_t--only when _KERNEL is defined--becomes an opaque type of the
definition:
    typedef struct { u_int32_t i; } dev_t;
This opaque type cannot be interchanged directly with an integer and will
cause deliberate compile errors, requiring all code to be 32 bit dev_t
compliant and breaking anything not solid to the "proper" API of macro
calls.  The type is the same size as a u_int32_t as userland sees the dev_t,
and AFAIK, this doesn't break userland structure offsets/alignment on any
existing port.  All kernel use of dev_t as an integer must comply with this
API.  Casting of a dev_t or direct access to its data is disallowed except
as listed in (11) below.

2. Our new dev_t will be split 12 bits major, 20 bits minor.  If the top 12
bits are zeroes, the dev_t is an "old" device when considering conversion in
the kernel.

3. The major device numbers will be renumbered into three blocks.  Major
number 0 will not be used; it is reserved as a flag for "old" dev_t's.
These three blocks will have separate bdevsw/cdevsw structures (planned to
be merged into a devsw structure if the API for the device calls is
rethought to include character and block distinctions in the calls).

%0xxxxxxx xxxx:  If the top bit of the major number is 0 (major 0 through
4095), the device is a dynamically allocated device (planned for future
expansion in a dynamic device system and/or LKMs). 

%10xxxxxx xxxx:  If the top bit is 1 and the next bit is 0 (major 4096
through 6143), the device is a statically numbered machine-independent
device (anything in src/sys/dev et al.).  MI devices are kept consistent
across all ports.

%11xxxxxx xxxx:  If the top two bits are 1 (major 6144 through 8191), the
device is a statically numbered machine-architecture-dependent device.  MD
devices are kept consistent across all ports of the same ${MACHINE_ARCH}.
MD devices which are ported to become MI will receive MI major numbers, but
their MD numbers will not be decommissioned.

4. Character and block device major numbers for a given device must match.
If a character device or a block device does not have a corresponding
counterpart, the counterpart will be unconfigured.

5. When COMPAT_[09-13] is defined in the kernel, the macro major() will
include inlined support for an old-to-new major number conversion table (one
for block, and one for character).  Both the major() and minor() macros will
retrieve only the proper set of bits from the dev_t depending on the top 12
bits.

6. The stat interface will be bumped a version number again, introducing
__stat14(), __fstat14(), and __lstat14().  These will return a file's dev_t
unchanged, or if COMPAT_[09-13] are defined, dev_t's always converted to new
format using the old-to-new conversion table above.  mknod(2) will not be
changed, and will always create device nodes with the numbers unchanged.

7. The old stat interfaces, if included by a COMPAT_[09-13] option, will do
direct searchs of the old-to-new table above to demote new dev_t's to 16 bit
dev_t's.  This can cause no-matches, which should be listed as major number
255.  Programs _needing_ use of the major and minor numbers of a dev_t
should conceivably be recompiled, but this gives _some_ useful values in
the case where compatibility is required, such as finding a process's tty
device based on device number.  Compat routines for other OS's may also
require this inverse mapping, or may use a "truncated" major device number.

8. In the kernel, any direct equality comparisons of dev_t's will be changed
to use a new macro, isdevequal(), which does the logic of:
    ((major(x) == major(y)) && (minor(x) == minor(y)))
when any of COMPAT_[09-13] are defined.  Without a compat option, it will
collapse to a binary compare.  This compare will include the old-to-new
remapping automatically.

9. In the kernel, any need to use the dev_t value as a seed value (for hash
tables and the like) will extract it using the macro devtoint().  This will
provide a u_int32_t value equal to makedev(major(x),minor(x))--inlining
conversions from old dev_t's as necessary.  This is _not_ a cop-out function
and is only allowed in this particular context (hash values).

10. mknod(8) will be introduced a new command line option to create "old" 
style device nodes.  Possibly, mknod(8) will be modified to have an option
to specify explicitly the number of bits used in each of the major or minor
device numbers. 

11. The only API exceptions are shared filesystem space.  nfs and afs
servers et al. are allowed to access directly the u_int32_t inside a dev_t. 
Whether this facility should become a macro (devtoraw()?) is as yet
undefined.  Whether this will require a special set of stat() calls to
return only raw dev_t's is as yet undefined.

12. The old-to-new remapping may be tunable via a sysctl, if applications or
filesystem servers need access to raw dev_t's in the standard set of
__stat14() syscalls, even with COMPAT_[09-13] in the kernel.  This is as yet
undefined. 

=====
===== Todd Vierling (Personal tv@pobox.com) =====
== "There's a myth that there is a scarcity of justice to go around, so
== that if we extend justice to 'those people,' it will somehow erode the
== quality of justice everyone else receives."  -- Maria Price