tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: blocksizes



On Fri, Jan 22, 2010 at 05:46:31AM +0000, David Holland wrote:
> On Thu, Jan 21, 2010 at 10:30:20PM +0000, Michael van Elst wrote:
>  > IMHO there need to be three different ways to specify block
>  > offsets and block counts:
>  > 
>  > 1. in units of blocks of the physical device
>  > 2. in units of blocks of DEV_BSIZE bytes
>  > 3. in bytes
> 
> Don't forget: 4. in units of the filesystem block size...


I ommitted this from the list because only the filesystem
itself has the notion of 'filesystem block size', but when
talking to the device it goes back to use DEV_BSIZE. It
becomes clear that 'filesystem block size' is a very private
measure of a filesystem when you think about FFS fragments
where the filesystem already uses a second size and about
aggregated IO where multiple blocks are accessed as one
unit.


>  > and we need to establish what units are used where.
> 
> IM (fairly strong) O everything should be kept in byte counts, and
> never block counts because if you have more than one unit in use it is
> far too easy to accidentally mix them or provide the wrong one, and
> because they're all the same language-level type there's little hope
> of detecting such problems automatically.

I would like a system where all I/O is measured in bytes, but this
requires a complete redesign for all disk devices and all filesystems.

And you won't get rid of the physical blocks, at some point you
have to translate.



> Furthermore, Murphy's Law dictates that in any particular place the
> count you are given is frequently not in the units you need to give
> something else, and then you end up converting back and forth all over
> everywhere. This serves no purpose and tends to obfuscate the code
> base.

This is how it works now. We do translate blocks back and forth
all over the place, except that there a lot of assumptions that
physical block size is the same as DEV_BSIZE.

Also, filesystems organize data in larger chunks. There is always
some translation going on between block or extent numbers and
now DEV_BSIZE offsets or byte offset in your ideal system.

On the filesystem side it won't get simpler.



>  > The necessary changes are rather small. In particular, dkwedge_info needs
>  > to be extended to keep track of the physical sector size so that the dk
>  > driver can do the transformations.
> 
> The physical sector size should be available to callers (just not part
> of the API/ABI) so this ought to be done regardless.

I haven't thought about compatibility issues yet, where is dkwedge_info
exposed to userland?



Greetings,
-- 
                                Michael van Elst
Internet: mlelstv%serpens.de@localhost
                                "A potential Snark may lurk in every tree."


Home | Main Index | Thread Index | Old Index