tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: blocksizes

On Fri, Jan 22, 2010 at 08:07:03AM +0100, Michael van Elst wrote:
 > On Fri, Jan 22, 2010 at 05:46:31AM +0000, David Holland wrote:
 > > On Thu, Jan 21, 2010 at 10:30:20PM +0000, Michael van Elst wrote:
 > > > IMHO there need to be three different ways to specify block
 > > > offsets and block counts:
 > > > 
 > > > 1. in units of blocks of the physical device
 > > > 2. in units of blocks of DEV_BSIZE bytes
 > > > 3. in bytes
 > > 
 > > Don't forget: 4. in units of the filesystem block size...
 > I ommitted this from the list because only the filesystem
 > itself has the notion of 'filesystem block size', but when
 > talking to the device it goes back to use DEV_BSIZE. It
 > becomes clear that 'filesystem block size' is a very private
 > measure of a filesystem when you think about FFS fragments
 > where the filesystem already uses a second size and about
 > aggregated IO where multiple blocks are accessed as one
 > unit.

Indeed. But it's still floating around in the system and still a
possible complication. It's not *quite* invisible outside of each
filesystem; e.g. it affects caching.

 > > > and we need to establish what units are used where.
 > > 
 > > IM (fairly strong) O everything should be kept in byte counts, and
 > > never block counts because if you have more than one unit in use it is
 > > far too easy to accidentally mix them or provide the wrong one, and
 > > because they're all the same language-level type there's little hope
 > > of detecting such problems automatically.
 > I would like a system where all I/O is measured in bytes, but this
 > requires a complete redesign for all disk devices and all filesystems.

Right, but I think we should make this the end goal. Nobody says we
need to expect to get there promptly. :-/

 > And you won't get rid of the physical blocks, at some point you
 > have to translate.

Only when interfacing, as previously noted. (And, as noted elsewhere,
the places that this is required also includes on-disk formats.)

 > > Furthermore, Murphy's Law dictates that in any particular place the
 > > count you are given is frequently not in the units you need to give
 > > something else, and then you end up converting back and forth all over
 > > everywhere. This serves no purpose and tends to obfuscate the code
 > > base.
 > This is how it works now. We do translate blocks back and forth
 > all over the place, except that there a lot of assumptions that
 > physical block size is the same as DEV_BSIZE.

Right. Wading through such logic is one of the things that convinced
me (a long time ago) that it shouldn't exist. Implementing such stuff
in research kernels was the other driving factor - it is too easy to
get wrong and you can't afford to spend time dealing with it.

 > Also, filesystems organize data in larger chunks. There is always
 > some translation going on between block or extent numbers and
 > now DEV_BSIZE offsets or byte offset in your ideal system.
 > On the filesystem side it won't get simpler.

It will, some. 

% grep fsbtodb sys/ufs/ffs/*.[ch] | wc -l

That's quite a few more than ought to be there, IMO.

Meanwhile, other things will get quite a bit simpler.

 > > The physical sector size should be available to callers (just not part
 > > of the API/ABI) so this ought to be done regardless.
 > I haven't thought about compatibility issues yet, where is dkwedge_info
 > exposed to userland?

I dunno, I'm not all that up on wedges.

David A. Holland

Home | Main Index | Thread Index | Old Index