Subject: Re: RFC: addition of B_MANAGED and B_NESTED to buf.h and vfs_bio.c
To: Bill Studenmund <wrstuden@netbsd.org>
From: Reinoud Zandijk <reinoud@netbsd.org>
List: tech-kern
Date: 01/03/2006 22:40:59
--Q68bSM7Ycu6FN28Q
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Tue, Jan 03, 2006 at 12:05:16PM -0800, Bill Studenmund wrote:
> >     B_MANAGED
> > marks that its buffer contents memory is managed by the caller and should 
> > not be put on a list for reuse or freed. Normally useable by say a 
> > filingsystem that has a private memory buffer for say a descriptor or disc 
> > structure thats not kept in a buffer and wants to write it out without the 
> > need to manipulate the VM or to copy stuff around.
> 
> If I understand what you're describing, B_EXTERN may be a better term.

B_EXTERN? why named extern? FreeBSD uses the term B_MANAGED too for this 
kind of buffers and since its a `managed' buffer by some other party and 
not a part of the normal queues. The name B_EXTERN doesn't ring a bell with 
me really.

> > Managed buffers are claimed and released directly from/to the bufpool to 
> > not destroy valueable data. Managed buffers can be brelse()'d.
> 
> How is this different from how LFS uses B_LOCKED to manage buffers itself?

Say a filingsystem has a malloc'd space in wich it keeps for example a FAT 
or some other filing system structure. The only thing to write it out in 
one go is then:

	buf = getmanagedbuf(xxx->yyyloc, structp, structlen);
	buf->b_flags |= B_WRITE;
	VOP_STRATEGY(vp, buf)
	brelse(buf);

Otherwise it would need either a custom made callback system with a custom 
created buffer and a custom cleanup afterwards in the callback or create a 
buffer and copy stuff in. Reduces complexity a lot for such situations.

LFS use of locked buffers is different since that only keeps buffers around 
and prevents reuse. Its also handy for the nested case since its also 
managed by definition.

> >     B_NESTED
> > marks the buffer as a nested buffer. Nested buffers are currently used in 
> > genfs and lfs (get/put pages) though implemented as two special case 
> > generated buffers with call-backs.
> 
> Why do we need a flag? Why not just continue using the callback method? We 
> already have B_CALL, which lets us do anything. Why do we need a special 
> flag?

the call back method sure is a very generic way and very handy indeed. To 
overload/abuse this to provide other functionality like for the managed and 
for the nested buffers is IMHO clumsy. The call back method is also 
hardcoded to be for `managed' buffers only since the call back handler has 
to free the buffers themselves.

vfs_bio.c states: (biodone)
/*
 * Mark I/O complete on a buffer.
 *
 * If a callback has been requested, e.g. the pageout
 * daemon, do so. Otherwise, awaken waiting processes.
 *
 * [ Leffler, et al., says on p.247:
 *      "This routine wakes up the blocked process, frees the buffer
 *      for an asynchronous write, or, for a request by the pagedaemon
 *      process, invokes a procedure specified in the buffer structure" ]
 *
 * In real life, the pagedaemon (or other system processes) wants
 * to do async stuff to, and doesn't want the buffer brelse()'d.
 * (for swap pager, that puts swap buffers on the free lists (!!!),
 * for the vn device, that puts malloc'd buffers on the free lists!)
 */
--------
        /*
         * If necessary, call out.  Unlock the buffer before calling
         * iodone() as the buffer isn't valid any more when it return.
         */
        if (ISSET(bp->b_flags, B_CALL)) {
                CLR(bp->b_flags, B_CALL);      /* but note callout done */
                simple_unlock(&bp->b_interlock);
                (*bp->b_iodone)(bp);
        } else {
                if (ISSET(bp->b_flags, B_ASYNC)) { /* if async, release */
                        simple_unlock(&bp->b_interlock);
                        brelse(bp);
                } else {                   /* or just wakeup the buffer */
                        CLR(bp->b_flags, B_WANTED);
                        wakeup(bp);
                        simple_unlock(&bp->b_interlock);
                }
        }
--------------

The callback facility ought to be available to all kinds of buffer actions 
without them also needing to clean up afterwards but thats another patch 
and would change current behaviour. That'll be a lot larger a patch :)

The pagedeamon could use B_MANAGED buffers :)

> If multiple call paths are doing the same buffer batching operations, then 
> I can certainly see adding a buffer cache callback that all these uses can 
> share.

A buffer could be split up for various reasons and all sheduled 
independently. The call-back could then optionally be used for whatever 
reason the function had to split up the buffer.

> While it means more work, I think you should change both of them over to 
> the new scheme. As it is, you're adding code to handle a special case, 
> UDF. If, however, your patch shows us adding common code and then removing 
> duplicates in both genfs and lfs, then it is VERY CLEAR what the 
> advantages of the change are.

Its not for UDF only really... its just that i saw it come by in genfs and 
lfs and figured out a more generic way would be preferable too. UDF sure 
could benefit a lot from it too i'll have to admit.

> Yeah, something like B_EXTERN sounds right. Though there should be a 
> release callback. Just because it's not buffer cache data doesn't mean 
> there shouldn't be any cleanup. :-)

If one needs cleanup, biodone() has allready called the callback function 
for the part. The brelse is only for the struct buf itself, not the data in 
it. This brelse() will normally be called within this callback.

Hope this answers some of your questions,

With regards,
Reinoud


--Q68bSM7Ycu6FN28Q
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)

iQEVAwUBQ7rvXIKcNwBDyKpoAQJXYAf+M6LxG0tfZhQk8pNAKZBCI2kOxIK5Oj/X
Y+cZwRXuADfctUYHr6ejv0MsKznu3XiUOO7JxZHdNO8Mvggf5BzpTpk4+7KMoBVU
Uq53FzE4awGwmSrCybpYb6In4BAqNQDrnSt6v4znBYMfFZ5sn83wjNDVzFTM7a7j
PjwLGS4M5CMLU3NXc7mG2XF0OcM5TUZTNaMaiWLLSQO8aZkJGoH/4a/hw0V/Ayo3
Jl68MCQnJWxiSux95ZNCSyb4qD3g8ILbfCds9ZBKQXQF73l4avpBDsMXSnx3wie0
mKmY8Bl5Y/Zv/Hiow6trNA2ftNWJ53/S9ttm/+sbdAQJ6Gx0DJcQxw==
=8pfZ
-----END PGP SIGNATURE-----

--Q68bSM7Ycu6FN28Q--