tech-kern: Supporting sector size != DEV

Subject: Supporting sector size != DEV_BSIZE
To: None <tech-kern@netbsd.org>
From: Trevin Beattie <trevin@xmission.com>
List: tech-kern
Date: 10/05/2001 19:13:26
I have attempted to make some modifications to the (1.5.2) kernel in an
effort to find a way to get disk I/O to work with optical media whose
sector size is 2048 bytes rather than the 512-byte sectors used in magnetic
media.  The initial attempt has failed, and I would like to start a
discussion into what will be the best way to support devices with varying
block sizes.

In scanning the source code for the i386 port, I have found hundreds of
references to DEV_BSHIFT and DEV_BSIZE, which (for the purposes of my
experiment) are the "magic numbers" 9 and 512.  No formal definition of
these macros can be found in the code or in section 9 of the manual, so as
far as I'm concerned they have no basis in reality and must be eliminated.
The idea was that in all instances where a block number or block size was
required, the code must use the actual physical sector size of the device
being accessed.

For example, the macro btodb(off) would be replaced by the expression (off
/ lp->d_secsize) wherever the disk label is available, and the expression
(lp->d_secsize / DEV_BSIZE) would be eliminated since it evaluates to 1.
In cases where the device doesn't have a block size per se, the macro
DEV_BSIZE would be replaced with an equivalent device-specific macro (e.g.,
KFS_BSIZE) where using an arbitrary size doesn't matter.

The problem with this approach is that there are many places in the code
where the physical block size is either not known or simply not passed to
the functions that need them.  So, for example, physio() converts
uio_offset to b_blkno and calls *strategy() without any knowledge or
consideration for whether the block number is correct for the device.

Consider, for example, a 1-MB file configured as a virtual disk with 512*
2048-byte sectors.  If the device is dumped using the character interface
(rvnd0d), the output is 1MB of data followed by an I/O error when some
function in the kernel attempts to read sector 512, because it thinks the
virtual disk has 2048* 512-byte sectors.  If the device is dumped using the
block interface (vnd0d), the output is 4MB of data--2048* 2048-byte
sectors.  (Analysis reveals that each physical sector is repeated 4 times.)

Throughout the kernel code, there are expressions which look like they're
designed to translate some idea of a "logical" sector size of (arbitrarily)
512 bytes to the actual sector size.  For devices in which the sector size
actually is 512, these expressions are essentially no-ops, so their
usefulness is questionable.  For devices that have a different sector size,
clearly the conversion is not working somewhere along the line.

So the first idea to consider is whether we should do away with any notion
of logical sectors and make all block-oriented functions use the actual
sector size (and corresponding sector numbering) of the device.  This
approach requires that a device's block size be made available to all
functions which use it, which means that a lot of function interfaces will
have to be re-written.  Also, it may present problems with the ccd device,
if the concatenated disks use different sector sizes.

The second idea is whether all block I/O should be done (at the user level)
in terms of a uniform "logical" sector size, regardless of the underlying
physical sector size, and only convert to the real sector number and size
when communicating with the physical device.  Going this route will require
an extensive audit of the kernel code to make sure that conversions are
done in every place required, and at no other time.  Using a uniform sector
size may also create problems for userland applications which access
block-oriented devices, such as CD-ROMs (where the physical sector size is
often assumed to be 2048 bytes--this also doesn't have to be the case).

-----------------------
Trevin Beattie          "Do not meddle in the affairs of wizards,
trevin@xmission.com     for you are crunchy and good with ketchup."
      {:->                                     --unknown