Subject: Work-in-progress "wedges" implementation
To: NetBSD tech-kern <tech-kern@netbsd.org>
From: Jason Thorpe <thorpej@shagadelic.org>
List: tech-kern
Date: 09/22/2004 13:26:34
--Apple-Mail-26--534005905
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed

Wedges are a new way of representing disk partitions in the NetBSD 
kernel.

The basic idea is to decouple the internal representation of disk 
partitions
from the on-disk representation.  Currently, the NetBSD kernel uses 
"struct
disklabel" (a.k.a. BSD disklabel) for both in-core and on-disk 
representation,
and operates on this structure exclusively.

The main problem is that some platforms use (by necessity) on-disk
representations other than the BSD disklabel.  This is generally to
maintain compatibility with another OS on the platform (e.g. Mac OS on
a Macintosh), or because the system firmware understands a particular
format (e.g. Sun PROMs understand Sun disklabels).

In order to handle this "other format", individual platforms may support
an alternative on-disk representation.  In the kernel, this is 
represented
by "struct cpu_disklabel".  Unfortunately, there are drawbacks to this
approach:

         - Cross-platform disk portability is basically non-existent.

         - The BSD disklabel cannot represent all of the pertinent
           information of some other on-disk representations, and
           vice-versa.  This includes number of partitions and
           partition names.

Another problem is the fact that the BSD disklabel uses 32-bit fields
for block numbers.  This means that the largest disk that the BSD 
disklabel
can describe is 2TB, which is not terribly large by today's standards.

Finally, in a world with hot-plug busses where devices may appear and
disappear at any time, deterministic disk probe ordering does not exist.
The old-fashioned disk naming scheme is not very usable in this 
scenario.

Wedges solves these problems in the following ways:

         - Disk partitions are represented in the kernel as separate
           block devices, and there can be an arbitrary number of these
           associated with a disk.  Each wedge internally uses 64-bit
           block numbers to support partitions > 2TB.

         - Wedges includes a modular partition discovery framework, 
allowing
           different partition formats to be supported seamlessly on all
           platforms.  A module for the EFI GUID Partition Table (GPT)
           format, which includes arbitrary numbers of partitions, 64-bit
           block numbers, and Unicode partition names, is included.

         - Wedges may also be configured using ioctls from user space,
           allowing partition handling to be pushed out of the kernel,
           if desired.

         - Wedges are "named".  That is, each wedge has an associated
           name encoded in UTF-8.  This name can be used to create a
           device node in /dev to decouple the wedge's identity from
           its probe-order-dependent unit number.  Duplicate names are
           suppressed, and partition discovery modules can try alternate
           names in the event of a collision.  For example, the GPT 
module
           may try the Unicode name associated with the GPT partition, 
and
           of that already exists, it may try again using the string
           representation of the partition's GUID.

         - Wedges represent partition types as strings, allowing for
           arbitrary partition types.

The wedges implementation is a work-in-progress at the moment, designed
to allow for the use of old-style disk naming while wedges are still
under development.  Features of the current wedges implementation:

         1. More items are moved from individual disk softc structures
            into "struct disk".  Among other things, this allows for
            information sharing and better synchronization between
            wedges and their parent disks.

         2. I/O is enqueued on the wedge and a new buf allocated in order
            to perform I/O on the parent.  This is a transitional 
measure;
            I would like to eventually make it possible for disk drivers 
to
            operate directly on the buf provided to the wedge.

         3. Once wedges are created on a disk, I/O to that disk may only
            be performed through its wedges, or on the disk's RAW_PART.
            Wedges may not be created on a disk if any partition other
            than RAW_PART is open.

         4. A minphys entry point is added to "struct dkdriver".  
Eventually,
            I would like to fully utilize "struct dkdriver" as the 
interface
            to a disk from a wedge, rather than using a vnode.  Once we 
are
            fully transitioned to wedges, I would like to see the 
traditional
            entry points to disk drivers go away, with the exception of 
an
            entry point for the raw disk, so that partitions may be 
created
            on it.

         5. My patch includes modifications to make wedges work with the 
"wd"
            driver.  I will convert the other disk drivers over time.  An
            outstanding question: What should we do about floppy drives?

         6. I have modified fsck and mount to use the partition type 
names
            that wedges provide.  Conveniently, I have defined names that
            match the fsck_* and mount_* names for the various partition
            types that indicate file systems.

Known issues:

         1. You can't currently newfs a wedge.  This is because newfs
            requires the old-style DIOCGDLABEL ioctl, which wedges do
            not support.  I am working on a means for exporting the
            parent disk's geometry through the wedge, which is what
            newfs wants.

         2. Related to (1), what to do about the block size / frag size
            entries in "struct partition" (part of "struct disklabel",
            and this antiquated and obsolete and not part of wedges)?

I would like to get "wedges" checked into the tree to allow for greater 
collaboration on it.  Since it does not interfere with the use of disks 
through the traditional interface, I don't think it's necessary to put 
this on a branch.

Diffs for review are at:

	ftp://ftp.shagadelic.org/pub/wedge-diffs.txt

Thanks.

         -- Jason R. Thorpe <thorpej@shagadelic.org>

--Apple-Mail-26--534005905
content-type: application/pgp-signature; x-mac-type=70674453;
	name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)

iD8DBQFBUd/6OpVKkaBm8XkRAuYhAKDL7Em3uvZibSTvWolRzIU5VKvEpQCdGhn8
WsbTLEZHQBWszbH42r4VCAs=
=NYVz
-----END PGP SIGNATURE-----

--Apple-Mail-26--534005905--