Subject: Re: RFC: addition of B_MANAGED and B_NESTED to buf.h and vfs_bio.c
To: None <tech-kern@NetBSD.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 01/03/2006 12:05:16
--6WlEvdN9Dv0WHSBl
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Jan 03, 2006 at 05:34:27PM +0100, Reinoud Zandijk wrote:
> Dear folks,
>=20
> i'd like to add two flags to buf.h. They add new functionality to the=20
> buffer structure without changing current behaviour.
>=20
> --------------------
> B_MANAGED
> marks that its buffer contents memory is managed by the caller and should=
=20
> not be put on a list for reuse or freed. Normally useable by say a=20
> filingsystem that has a private memory buffer for say a descriptor or dis=
c=20
> structure thats not kept in a buffer and wants to write it out without th=
e=20
> need to manipulate the VM or to copy stuff around.
If I understand what you're describing, B_EXTERN may be a better term.
> Managed buffers are claimed and released directly from/to the bufpool to=
=20
> not destroy valueable data. Managed buffers can be brelse()'d.
How is this different from how LFS uses B_LOCKED to manage buffers itself?
> B_NESTED
> marks the buffer as a nested buffer. Nested buffers are currently used in=
=20
> genfs and lfs (get/put pages) though implemented as two special case=20
> generated buffers with call-backs.
>=20
> A nested buffer is basicly describing the action of the origional buffer=
=20
> over a piece of its buffer space. A series of nested buffers created from=
=20
> the master buffer describe together the action. Nested buffers can have=
=20
> their own call-back and variables too extending the idea used in genfs an=
d=20
> lfs.
>=20
> Buffer nesting is transparant. A callee doesn't need to know anything of=
=20
> the mechanism and can easily nest it again if it wants too.=20
> biowait/biodone/brelse work as expected.
>=20
> When a nested buffer is biodone()'d its master buffer will be updated. Wh=
en=20
> the b_resid value of the master buffer then reaches zero, biodone() will =
be=20
> called on the master buffer.
>=20
> Nested buffers are claimed and released directly from/to the bufpool to n=
ot=20
> destroy valueable data. Nested buffers can be brelse()'d.
Why do we need a flag? Why not just continue using the callback method? We=
=20
already have B_CALL, which lets us do anything. Why do we need a special=20
flag?
If multiple call paths are doing the same buffer batching operations, then=
=20
I can certainly see adding a buffer cache callback that all these uses can=
=20
share.
> ----------------
>=20
> I've appended the patch. Note that this code has been tested for normal=
=20
> operations but that the managed and nested buffers haven't actually been=
=20
> used yet. I'd rather have feedback first before i change UDF to use the=
=20
> buffer types. Genfs and lfs could prolly best be done by their authors.
While it means more work, I think you should change both of them over to=20
the new scheme. As it is, you're adding code to handle a special case,=20
UDF. If, however, your patch shows us adding common code and then removing=
=20
duplicates in both genfs and lfs, then it is VERY CLEAR what the=20
advantages of the change are.
> With regards,
> Reinoud
>=20
> Index: kern/vfs_bio.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
> retrieving revision 1.148
> diff -u -r1.148 vfs_bio.c
> --- kern/vfs_bio.c 24 Dec 2005 19:12:23 -0000 1.148
> +++ kern/vfs_bio.c 3 Jan 2006 16:05:47 -0000
> @@ -904,6 +904,16 @@
> wakeup(bp);
> }
> =20
> + /*=20
> + * If it's a managed buffer, only recycle struct buf since it doesn't
> + * own its data block and should not be cached or reused in the normal
> + * way.
> + */
> + if (ISSET(bp->b_flags, B_MANAGED)) {
> + bp->b_bufsize =3D 0;
> + goto clearup;
> + };
> +
Yeah, something like B_EXTERN sounds right. Though there should be a=20
release callback. Just because it's not buffer cache data doesn't mean=20
there shouldn't be any cleanup. :-)
> /*
> * Determine which queue the buffer should be on, then put it there.
> */
> @@ -990,6 +1000,7 @@
> CLR(bp->b_flags, B_AGE|B_ASYNC|B_BUSY|B_NOCACHE);
> SET(bp->b_flags, B_CACHE);
> =20
> +clearup:
> /* Allow disk interrupts. */
> simple_unlock(&bp->b_interlock);
> simple_unlock(&bqueue_slock);
> @@ -1003,6 +1014,85 @@
> }
> =20
> /*
> + * Get a new managed buffer. Memory assigned to it is NOT owned by the b=
uffer
> + * cache but by the caller, typically a filingsystem.
file system.
> + */
> +struct buf *
> +getmanagedbuf(int blkno, caddr_t base, int size)
> +{
> + int s;
> + struct buf *bp;
> +
> + KASSERT(base !=3D NULL);
> + KASSERT(size > 0);
> +
> + s =3D splbio();
> + simple_lock(&bqueue_slock);
> +
> + /* please don't destroy valueable data */
> + bp =3D pool_get(&bufpool, PR_WAITOK);
> +
> + /* Initialise buffer */
> + memset(bp, 0, sizeof(struct buf));
> + BUF_INIT(bp);
> + bp->b_flags |=3D B_MANAGED;
> +
> + bp->b_blkno =3D blkno;
> + bp->b_data =3D base;
> + bp->b_bufsize =3D bp->b_bcount =3D bp->b_resid =3D size;
> +
> + simple_unlock(&bqueue_slock);
> + splx(s);
> +
> + return bp;
> +}
> +
> +/*
> + * Get a new buffer nested into the specified buffer at the given offset=
and
> + * length. NO read/write actions ought to be caried out on the master bu=
ffer
> + * anymore only on the nested buffers as they effectively split up the m=
aster
> + * buffer's action.
> + *
> + * Bug alert: make sure all nested buffers cover the complete mbp->resid
> + * space. If space is to be skipped, mbp->resid needs to be accounted f=
or or
> + * the biodone on mbp won't be called!
> + */
> +struct buf *
> +nestbuf(struct buf *mbp, int blkno, caddr_t base, int size)
> +{
> + int s;
> + struct buf *bp;
> +
> + KASSERT(base !=3D NULL);
> + KASSERT(size > 0);
> + KASSERT((mpb=3D=3DNULL) || (mpb && mbp->b_count < offset+size));
> +
> + s =3D splbio();
> + simple_lock(&bqueue_slock);
> +
> + /* please don't destroy valueable data */
> + bp =3D pool_get(&bufpool, PR_WAITOK);
> +
> + /*
> + * Adjust relevant information from the master buffer. set nested info
> + * and clear callback info and softdep info.
> + */
> + memcpy(bp, mbp, sizeof(struct buf));
> + bp->b_flags &=3D ~B_CALL;
> + bp->b_flags |=3D B_MANAGED | B_NESTED;
> + bp->b_masterbuf =3D mbp;
> +
> + bp->b_blkno =3D blkno;
> + bp->b_data =3D base;
> + bp->b_bufsize =3D bp->b_bcount =3D bp->b_resid =3D size;
> +
> + /* avoid confusion for softdep? */
> + LIST_INIT(&bp->b_dep);
> +
> + return bp;
> +}
> +
> +/*
> * Determine if a block is in the cache.
> * Just look on what would be its hash chain. If it's there, return
> * a pointer to it, unless it's marked invalid. If it's marked invalid,
> @@ -1384,6 +1474,8 @@
> biodone(struct buf *bp)
> {
> int s =3D splbio();
> + int isnested;
> + struct buf *mbp;
> =20
> simple_lock(&bp->b_interlock);
> if (ISSET(bp->b_flags, B_DONE))
> @@ -1397,6 +1489,18 @@
> if (!ISSET(bp->b_flags, B_READ)) /* wake up reader */
> vwakeup(bp);
> =20
> + /* Propagate status to master buffer if its a nested buffer */
> + isnested =3D ISSET(bp->b_flags, B_NESTED);
> + mbp =3D bp->b_masterbuf;
> + if (isnested) {
> + KASSERT(mbp);
> + if (bp->b_flags & B_ERROR) {
> + mbp->b_flags |=3D B_ERROR;
> + mbp->b_error =3D bp->b_error;
> + };
> + mbp->b_resid -=3D bp->b_bcount;
> + };
> +
> /*
> * If necessary, call out. Unlock the buffer before calling
> * iodone() as the buffer isn't valid any more when it return.
> @@ -1417,6 +1521,12 @@
> }
> =20
> splx(s);
> +
> + /* call biodone on master buffer if its completed by this op */
> + if (isnested) {
> + if (mbp->b_resid =3D=3D 0)
> + biodone(mbp);
> + };
> }
> =20
> /*
> Index: sys/buf.h
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvsroot/src/sys/sys/buf.h,v
> retrieving revision 1.84
> diff -u -r1.84 buf.h
> --- sys/buf.h 11 Dec 2005 12:25:20 -0000 1.84
> +++ sys/buf.h 3 Jan 2006 16:05:47 -0000
> @@ -145,6 +145,7 @@
> struct vnode *b_vp; /* File vnode. */
> struct workhead b_dep; /* List of filesystem dependencies. */
> void *b_saveaddr; /* Original b_addr for physio. */
> + struct buf *b_masterbuf; /* If nested, points to masterbuf */
> =20
> /*
> * private data for owner.
> @@ -213,11 +214,14 @@
> #define B_WRITE 0x00000000 /* Write buffer (pseudo flag). */
> #define B_XXX 0x02000000 /* Debugging flag. */
> #define B_VFLUSH 0x04000000 /* Buffer is being synced. */
> +#define B_MANAGED 0x08000000 /* Buffer's memory is not its own */
> +#define B_NESTED 0x10000000 /* Buffer is not standalone */
> =20
> #define BUF_FLAGBITS \
> "\20\1AGE\3ASYNC\4BAD\5BUSY\6SCANNED\7CALL\10DELWRI" \
> "\11DIRTY\12DONE\13EINTR\14ERROR\15GATHERED\16INVAL\17LOCKED\20NOCAC=
HE" \
> - "\22CACHE\23PHYS\24RAW\25READ\26TAPE\30WANTED\32XXX\33VFLUSH"
> + "\22CACHE\23PHYS\24RAW\25READ\26TAPE\30WANTED\32XXX\33VFLUSH\34MANAG=
ED" \
> + "\35NESTED"
> =20
> =20
> /*
> @@ -289,6 +293,8 @@
> struct buf *geteblk(int);
> struct buf *getnewbuf(int, int, int);
> struct buf *incore(struct vnode *, daddr_t);
> +struct buf *getmanagedbuf(int, caddr_t, int);
> +struct buf *nestbuf(struct buf *, int, caddr_t, int);
> =20
> void minphys(struct buf *);
> int physio(void (*)(struct buf *), struct buf *, dev_t, int,
Take care,
Bill
--6WlEvdN9Dv0WHSBl
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)
iD8DBQFDutj8Wz+3JHUci9cRAqd5AKCQ3t4FGoDOP2hus/ABDJmdL/Xp3QCcDMNP
Z0t9koOM1SMrxxfyiunr3Hk=
=8YfN
-----END PGP SIGNATURE-----
--6WlEvdN9Dv0WHSBl--