Subject: Re: Supporting sector size != DEV_BSIZE
To: Trevin Beattie <trevin@xmission.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 10/07/2001 05:38:15
On Fri, 5 Oct 2001, Trevin Beattie wrote:

> I have attempted to make some modifications to the (1.5.2) kernel in an
> effort to find a way to get disk I/O to work with optical media whose
> sector size is 2048 bytes rather than the 512-byte sectors used in magnetic
> media.  The initial attempt has failed, and I would like to start a
> discussion into what will be the best way to support devices with varying
> block sizes.
>
> In scanning the source code for the i386 port, I have found hundreds of
> references to DEV_BSHIFT and DEV_BSIZE, which (for the purposes of my
> experiment) are the "magic numbers" 9 and 512.  No formal definition of
> these macros can be found in the code or in section 9 of the manual, so as
> far as I'm concerned they have no basis in reality and must be eliminated.
> The idea was that in all instances where a block number or block size was
> required, the code must use the actual physical sector size of the device
> being accessed.

Well, it's nice that you feel they have no basis in reality. They've been
that way for quite a while. :-)

I've been there and done that. You can check out the results on the
wrstuden-dev-bsize branch. I had scsi disks working fine, with both 512
and 2048 byte sectors in the same system. i386 floppies had a problem.

There are three PRs in the database talking about different ways to fix
this. They were written by a Japanese developer, who unfortunatly died in
a motorcycle accident shortly there after. My efforts on the branch were
following one of the PRs (forgot which one) which got rid of DEV_BSIZE.

The branch is abandoned at the moment. Among other things, Chuck Silvers,
who has done the UBC work, has followed on with a different approach (one
of the other PRs). As he is making things happen and I am not in a
position to work on it, I'll let him finish things up.

> For example, the macro btodb(off) would be replaced by the expression (off
> / lp->d_secsize) wherever the disk label is available, and the expression
> (lp->d_secsize / DEV_BSIZE) would be eliminated since it evaluates to 1.
> In cases where the device doesn't have a block size per se, the macro
> DEV_BSIZE would be replaced with an equivalent device-specific macro (e.g.,
> KFS_BSIZE) where using an arbitrary size doesn't matter.

Don't do division if you can help it. Stick w/ powers of two, then you can
do multiplication and division using shifts. One thing I built into the
wrstuden-dev-bsize branch was support for non-power-of-two block sizes; I
now think that's not needed. Especially with UBC - the size of a VM page
has a lot more importance now.

> The problem with this approach is that there are many places in the code
> where the physical block size is either not known or simply not passed to
> the functions that need them.  So, for example, physio() converts
> uio_offset to b_blkno and calls *strategy() without any knowledge or
> consideration for whether the block number is correct for the device.

Check out the branch. physio gained a new parameter.

> So the first idea to consider is whether we should do away with any notion
> of logical sectors and make all block-oriented functions use the actual
> sector size (and corresponding sector numbering) of the device.  This
> approach requires that a device's block size be made available to all
> functions which use it, which means that a lot of function interfaces will
> have to be re-written.  Also, it may present problems with the ccd device,
> if the concatenated disks use different sector sizes.
>
> The second idea is whether all block I/O should be done (at the user level)
> in terms of a uniform "logical" sector size, regardless of the underlying
> physical sector size, and only convert to the real sector number and size
> when communicating with the physical device.  Going this route will require
> an extensive audit of the kernel code to make sure that conversions are
> done in every place required, and at no other time.  Using a uniform sector
> size may also create problems for userland applications which access
> block-oriented devices, such as CD-ROMs (where the physical sector size is
> often assumed to be 2048 bytes--this also doesn't have to be the case).

See past threads and the PRs. You've basically presented two of the three
options.

Take care,

Bill