Subject: non-512-byte-sector devices vs. UBC
To: None <tech-kern@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 06/07/1999 07:48:13
I've been talking with various people (ig, soda, others) lately
about non-512-byte-sector devices and filesystems and what the
implications are for UBC.

the existing interface to disk-like devices (ie. struct buf) dictates
that all devices in the system will be addressed in the same units:
DEV_BSIZE, currently 512 bytes.  this is the units that b_blkno is
expressed in.  for devices with a sector size larger than 512 but
still a power of 2, this isn't a big deal;  the driver just needs
to shift by the appropriate amount to get the sector number.

devices with a sector size that's less than 512 (certain floppy
devices) or not power of 2 (raw or audio cdroms) are a whole 'nuther story...
these devices cannot address all sectors individually and so
can't be supported well (or at all, depending on how you look at it).

also, I'm told that certain operations in existing code don't work
because of confusion about how to deal with this.  I don't know what
these are exactly.

I see several way of addressing this problem:

(1) leave DEV_BSIZE at 512, fix the existing code.
    advantages:
	least amount of change to existing code.
	maximum compatibility with 3rd party code.
    disadvantages:
	reading raw cdroms will still need to be done with raw scsi commands.
	filesystems with blocksize < 512 bytes will need to use
	bounce-buffering to support UBC.
	fsck and other utilties which access the raw device will need to
	be aware that the device can only be accessed in 512-byte chunks
	even tho it might be desirable to access single sectors.
	thorny issues for devices whose total size is not a multiple of 512.


(2) make DEV_BSIZE device-specific.  this would require all clients
	of the "struct buf" interface to determine the units of b_blkno
	before making i/o requests.
    advantages:
	solves the problem in a sort of natural way with no weird
	artifacts like bounce-buffering.
    disadvantages:
	more complicated than existing design, lots of code to be changed.
	the DEV_BSIZE and DEV_BSHIFT macros would have to be removed since
	there would be no way to express the correct value.


(3) change DEV_BSIZE to 1, and make b_blkno a 64bit field (possibly by way
	of changing "daddr_t" to a 64bit type).
    advantages:
	also solves the problem in a natural way, with much less code
	change than (2).  most code should work as-is.
	supports devices larger than 2^40 bytes (though this is also true
	if we make b_blkno 64bits with the existing DEV_BSIZE).
    disadvantages:
	though this change to the interface is much less drastic than (2),
	it could still involve needing to "fix" some code which is working
	with the current DEV_BSIZE.
	the lower 9 bits of b_blkno will be wasted for most uses, since
	most devices and filesystems assume they would be 0.
	64bit b_blkno will cause extra overhead for devices that don't
	need it.



I propose option 3 as the best solution.  it fully supports devices of any
sector size with a minimum of code change.  I think the overhead for 64bit
daddr_t will be negligible.  we'll run out of bits in b_blkno again sooner
than with (2), but that will be decades in coming and the field can just be
made wider again.

comments?  alternative proposals?

-Chuck