Subject: RFC (reassign)buf and carvinf up buffers (was Re: SCSI MMC device abstraction and UDF patch for review)
To: Bill Studenmund <wrstuden@netbsd.org>
From: Reinoud Zandijk <reinoud@netbsd.org>
List: tech-kern
Date: 12/29/2005 00:36:47
--gBBFr7Ir9EOA20Yy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Wed, Dec 28, 2005 at 01:33:34PM -0800, Bill Studenmund wrote:
> On Sat, Dec 24, 2005 at 02:41:54PM +0100, Reinoud Zandijk wrote:
> > Yes and no. For UDF it is better to use caching in udf since blocks can 
> > move around esp. on recordables; every disc write changes the location of 
> > the block so its easier to have it cached on the file's vnode on its 
> > logical block instead of a moving buffer on disc. Setting B_NOCACHE in the 
> > buffers passed to the block device is prolly advisable yes.
> 
> Why is it easier?
> 
> UDF is not the first file system in our kernel to face this issue. LFS has 
> been dealing with it for over a decade. I think it would be VERY advisable 
> for UDF to do the exact same thing for a number of reasons:

scouting the LFS sources (tough at times) i stubled on the undocumented 
function `reassignbuf' (no mention at all in /usr/share/man/) that it uses 
from kern/vfs_subr.c that basicly transfers one buffer from one vnode to 
another vnode and updates any in-kernel lists where it might be in. No 
wonder i resorted to copying... Might be a good idea to test this 
hypothesis. On first sight i think it would be a clean thing to do too.

I think LFS uses the in-node-caching too I think and thats propably the way 
to do it. Greg Oster might know more about this after his LFS hacking ;)

> 1) LFS has worked out a huge number of the issues, so UDF can gain from 
> that experience.
> 
> 2) It will be much easier for others to maintain the code if we only have 
> one way that we cope with block-shuffling file systems.
> 
> 3) If UDF shows us we really need to fix an issue, LFS may gain from the 
> same changes.

true. Well part of my venturing out was to get new ways of solving the 
problems without resorting to the `we normally do it this way' solutions. 
Solutions where LFS might gain from too yes. 

I'll try out the reassign though i wonder what problems and complexity 
might arise from mismatched vop_strategy() buffer sizes and disc logicl 
block sizes. vop_strategy() now requests buffers upto say 64 kb/piece and 
the logical block size might be 2kb. Each part of 2kb can/could be stored 
somewhere else on disc. Normally a VOP_BMAP could determine the extent it 
could take in one go but looking up such information can be costly.

Proposal:
---------
What is missing IMHO is a generic way of sub-buffer handling. I.e. cutting 
up a large buffer into smaller ones by requesting empty buffer headers each 
assigned a piece of the larger space. genfs does this magic on this in 
close harmony with UVM by issueing VFS_STRATEGY() calls on each of them and 
UVM knowing they are just `pieces'. Normal claiming and recycling isn't 
done.

A more generic way could be a function in vfs_subr.c or in vfs_bio.c to 
carve up a large buffer into two pieces (a `master'- and a sub-buffer) that 
the FS can then assign logical block numbers too. By repeating this for all 
sequential extents it can be carved up just right for optimum performance.

To facilitate for callbacks and biodone() a mechanism is to be figured out. 
The origional `master'-buffer could have a counter that is helt for the 
number of sub-buffers pending. When this count reaches zero the biodone() 
and/or callbacks can be issued. Sub-buffers could be identified with a flag 
indicating its a subbuffer and with a pointer to the `master' buffer.

All filingsystems not wanting nor designed to use this system would require 
no changes at all and all other uses of struct buf wouldn't need changes 
too.

Comments?

Reinoud

--gBBFr7Ir9EOA20Yy
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)

iQEVAwUBQ7MhiIKcNwBDyKpoAQJXSAgAsd505bhyOdX/A6fILVW1d1J6l1pQwIoP
KXVM11mk2E01QACpiGS2TZ56xFmXTgSqCNdLyUirrewlgrtMa95HRcctaQpoYxDk
XdJ+tdW5BO6L5ZSJBrzL//+AjHPbb5Wl1Qa1kd5uFxPeGmx6XjY+Uudhr1bMpxX3
O9z88PvVpT6WE9nvZULARO+Zm5l0SwYWx0fOmO11eRHaZx8kHxbcZS/nFI7czpVN
MJIAOc4/nxux1Rg3A9+KDoD6k50Bi+GVIT4Jjszc36kLKjOc+AnL4U+g1VrWTQ6x
y+EmxHsRKcqcGUvp/xjuJmBl9Oh0a6jMYBVP8t+g6tMwGlYbqlAgtA==
=txFo
-----END PGP SIGNATURE-----

--gBBFr7Ir9EOA20Yy--