Subject: Re: non-512-byte-sector devices vs. UBC
To: Chuck Silvers <chuq@chuq.com>
From: Chris G. Demetriou <cgd@netbsd.org>
List: tech-kern
Date: 06/07/1999 16:08:52
Chuck Silvers <chuq@chuq.com> writes:
> (1) leave DEV_BSIZE at 512, fix the existing code.

Really, this way lies madness.  8-)


> (2) make DEV_BSIZE device-specific.  this would require all clients
> 	of the "struct buf" interface to determine the units of b_blkno
> 	before making i/o requests.

I'm inclined towards this one or a modification of (3).


> (3) change DEV_BSIZE to 1, and make b_blkno a 64bit field (possibly by way
> 	of changing "daddr_t" to a 64bit type).
>     advantages:
> 	also solves the problem in a natural way, with much less code
> 	change than (2).  most code should work as-is.
> 	supports devices larger than 2^40 bytes (though this is also true
> 	if we make b_blkno 64bits with the existing DEV_BSIZE).
>     disadvantages:
> 	though this change to the interface is much less drastic than (2),
> 	it could still involve needing to "fix" some code which is working
> 	with the current DEV_BSIZE.
> 	the lower 9 bits of b_blkno will be wasted for most uses, since
> 	most devices and filesystems assume they would be 0.
> 	64bit b_blkno will cause extra overhead for devices that don't
> 	need it.

Uh, excuse me, natural in what universe?

DEV_BSIZE having a value of 512 kinda makes sense, in that that's the
block size of a lot of devices we use.

DEV_BSIZE going away and/or having a 'variable' value also makes
sense, because it, or its replacement code, actually uses the block
size of the underlying device.

This proposal isn't nearly so logical.  In what world is the device
block size 1 byte?

What you've really said here is that you want to kill DEV_BSIZE and
replace b_blkno with something named something like b_offset, but you
don't want to go to the trouble of actually doing the work to do that.

If you want to actually do that, then _do it_, don't go half way.


(2) or a completed version of the idea in (3) is what would have my
vote.



On a related note, the problem of non-power-of-two block sizes was
brought up.  Is it really intended that they be supported in the
kernel?

I'm inclined to think that representation in terms of block numbers is
likely to be more efficient than representation in terms of bytes, but
not too much worse because it should be easy to look up block size (or
'size shift'), then calculate bytes easily.  With non-power-of-two
block sizes, however, no matter what you do, you need to do division
for byte->block and multiplication for block->byte translation.  On
some architecture, division is expensive.

Sure, sure, compared to the time it takes to do an I/O a few dozen
divisions isn't too much, but it becomes more significant if you're
talking about a cached block, and in either case it does consume CPU
that could have been used on something else...


cgd
-- 
Chris Demetriou - cgd@netbsd.org - http://www.netbsd.org/People/Pages/cgd.html
Disclaimer: Not speaking for NetBSD, just expressing my own opinion.