Subject: 32 bit dev_t, Revision 2
To: None <tech-kern@NetBSD.ORG>
From: Todd Vierling <tv@NetBSD.ORG>
List: tech-kern
Date: 01/11/1998 15:06:39
I have modified the 32 bit dev_t spec proposal as listed below.  Please send
any comments you may have (and CC to tech-kern).  The less painful the
transition to 32 bit dev_t's is, the better.

I have been told that one simplification of the transition method is just to
update MAKEDEV and mknod and re-run MAKEDEV from a miniroot setup.  While
this is easy for some people, it is not simple for everyone.  The proposal
below includes two-way device number compatibility and allows for the
ability to netboot NetBSD off of a NFS server with only 16 bit device
support. 

There will be context diffs to parts of the MI code, and the sparc MD code,
relative to -current (1.3A) posted tomorrow afternoon at the following URL:
    ftp://ftp.duh.org/pub/NetBSD-hacks/dev/dev32.diffs.gz

=====

1. dev_t--only when _KERNEL is defined--becomes an opaque type of the
definition:
    typedef union { u_int32_t i; } dev_t;
This opaque type cannot be interchanged directly with an integer and will
cause deliberate compile errors, requiring all code to be 32 bit dev_t
compliant and breaking anything not pure to this dev_t API of macro calls. 
The type is the same size as a u_int32_t as userland sees the dev_t, and
AFAIK, this doesn't break userland structure offsets/alignment on any
existing port.

2. After a reasonable "transition period" (one release?  until 2.0?) the
definition of dev_t will again be reduced to an integer value, and the
macro API will be reduced to integer definitions.  This "transition period"
with dev_t as an opaque type gives us the opportunity to find the improper
dev_t use in existing code and correct it.  It also allows us the ability to
specify a define switch (DEVICE_PURIFY?) to turn the opacity on at will in
future releases.

3. Our new dev_t will be split 12 bits major, 20 bits minor.  If the top 12
bits are zeroes, the dev_t is an "old" device when considering conversion in
the kernel.

4. The major device numbers will be renumbered into three blocks.  Major
number 0 will not be used; it is reserved as a flag for "old" dev_t's.
These three blocks will have separate bdevsw/cdevsw structures (planned to
be merged into a devsw structure if the API for the device calls is
rethought to include character and block distinctions in the calls).

%0xxxxxxx xxxx:  If the top bit of the major number is 0 (major 0 through
4095), the device is a dynamically allocated device (planned for future
expansion in a dynamic device system and/or LKMs). 

%10xxxxxx xxxx:  If the top bit is 1 and the next bit is 0 (major 4096
through 6143), the device is a statically numbered machine-independent
device (anything in src/sys/dev et al.).  MI devices are kept consistent
across all ports.

%11xxxxxx xxxx:  If the top two bits are 1 (major 6144 through 8191), the
device is a statically numbered machine-architecture-dependent device.  MD
devices are kept consistent across all ports of the same ${MACHINE_ARCH}.
MD devices which are ported to become MI will receive MI major numbers, but
their MD numbers will not be decommissioned.

5. Character and block device major numbers for a given device must match.
If a character device or a block device does not have a corresponding
counterpart, the counterpart will be unconfigured.

6. When COMPAT_[09-13] is defined in the kernel, the macro major() will
include inlined support for an old-to-new major number conversion table (one
for block, and one for character).  Both the major() and minor() macros will
retrieve only the proper set of bits from the dev_t depending on the top 12
bits.

7. The stat interface will be bumped a version number again, introducing
__stat14(), __fstat14(), and __lstat14().  These will return a file's dev_t
unchanged, or if COMPAT_[09-13] are defined, dev_t's always converted to new
format using the old-to-new conversion table above.  mknod(2) will not be
changed, and will always create device nodes with the numbers unchanged.

8. The old stat interfaces, if included by a COMPAT_[09-13] option, will do
direct searchs of the old-to-new table above to demote new dev_t's to 16 bit
dev_t's.  This can cause no-matches, which should be listed as major number
255.  Programs _needing_ use of the major and minor numbers of a dev_t
should conceivably be recompiled, but this gives _some_ useful values in
the case where compatibility is required, such as finding a process's tty
device based on device number.  Compat routines for other OS's may also
require this inverse mapping, or may use a "truncated" major device number.

9. In the kernel, any direct equality comparisons of dev_t's will be changed
to use a new macro, isdevequal(), which does the logic of:
    ((major(x) == major(y)) && (minor(x) == minor(y)))
when any of COMPAT_[09-13] are defined.  Without a compat option, it will
collapse to a binary compare.  This compare will include the old-to-new
remapping automatically.

10. The definition of NODEV will change to
    #define NODEV (u_int32_t)(-1)
and can only be compared to a dev_t after passing the dev_t through
devtoraw().

11. In the kernel, any need to use the dev_t value as a seed value (for hash
tables and the like) will extract it using the macro devtoint().  This will
provide a u_int32_t value equal to makedev(major(x),minor(x))--inlining
conversions from old dev_t's as necessary.  This is _not_ a cop-out function
and is only allowed in this particular context (hash values).

12. All kernel use of dev_t as an integer must comply with this API wrt
isdevequal(), devtoint(), devtoraw(), and rawtodev().  Direct access to its
data is disallowed, and use of devtoraw() and rawtodev() (convrting a dev_t
to/from a raw u_int32_t) is restricted only to conditions listed in (13)
below. 

13. The only exceptions to the dev-as-integer rule are
 - shared filesystem servers that need untranslated dev_t values
 - testing of dev_t raw values against special cookies (VNOVAL, NODEV, etc.)
Whether this will require a special set of stat() calls to return only raw
dev_t's is as yet undefined.

14. mknod(8) will be introduced to a new command line option to create "old" 
style device nodes.  Possibly, mknod(8) will be modified to have an option
to specify explicitly the number of bits used in each of the major or minor
device numbers.

15. The old-to-new remapping may be tunable via a sysctl, if applications or
filesystem servers need access to raw dev_t's in the standard set of
__stat14() syscalls, even with COMPAT_[09-13] in the kernel.  This is as yet
undefined. 

=====
===== Todd Vierling (Personal tv@pobox.com) =====
== "There's a myth that there is a scarcity of justice to go around, so
== that if we extend justice to 'those people,' it will somehow erode the
== quality of justice everyone else receives."  -- Maria Price