Subject: Re: RFC: addition of B_MANAGED and B_NESTED to buf.h and vfs_bio.c
To: None <tech-kern@NetBSD.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 01/03/2006 12:05:16
--6WlEvdN9Dv0WHSBl
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 03, 2006 at 05:34:27PM +0100, Reinoud Zandijk wrote:
> Dear folks,
>=20
> i'd like to add two flags to buf.h. They add new functionality to the=20
> buffer structure without changing current behaviour.
>=20
> --------------------
>     B_MANAGED
> marks that its buffer contents memory is managed by the caller and should=
=20
> not be put on a list for reuse or freed. Normally useable by say a=20
> filingsystem that has a private memory buffer for say a descriptor or dis=
c=20
> structure thats not kept in a buffer and wants to write it out without th=
e=20
> need to manipulate the VM or to copy stuff around.

If I understand what you're describing, B_EXTERN may be a better term.

> Managed buffers are claimed and released directly from/to the bufpool to=
=20
> not destroy valueable data. Managed buffers can be brelse()'d.

How is this different from how LFS uses B_LOCKED to manage buffers itself?

>     B_NESTED
> marks the buffer as a nested buffer. Nested buffers are currently used in=
=20
> genfs and lfs (get/put pages) though implemented as two special case=20
> generated buffers with call-backs.
>=20
> A nested buffer is basicly describing the action of the origional buffer=
=20
> over a piece of its buffer space. A series of nested buffers created from=
=20
> the master buffer describe together the action. Nested buffers can have=
=20
> their own call-back and variables too extending the idea used in genfs an=
d=20
> lfs.
>=20
> Buffer nesting is transparant. A callee doesn't need to know anything of=
=20
> the mechanism and can easily nest it again if it wants too.=20
> biowait/biodone/brelse work as expected.
>=20
> When a nested buffer is biodone()'d its master buffer will be updated. Wh=
en=20
> the b_resid value of the master buffer then reaches zero, biodone() will =
be=20
> called on the master buffer.
>=20
> Nested buffers are claimed and released directly from/to the bufpool to n=
ot=20
> destroy valueable data. Nested buffers can be brelse()'d.

Why do we need a flag? Why not just continue using the callback method? We=
=20
already have B_CALL, which lets us do anything. Why do we need a special=20
flag?

If multiple call paths are doing the same buffer batching operations, then=
=20
I can certainly see adding a buffer cache callback that all these uses can=
=20
share.

> ----------------
>=20
> I've appended the patch. Note that this code has been tested for normal=
=20
> operations but that the managed and nested buffers haven't actually been=
=20
> used yet. I'd rather have feedback first before i change UDF to use the=
=20
> buffer types. Genfs and lfs could prolly best be done by their authors.

While it means more work, I think you should change both of them over to=20
the new scheme. As it is, you're adding code to handle a special case,=20
UDF. If, however, your patch shows us adding common code and then removing=
=20
duplicates in both genfs and lfs, then it is VERY CLEAR what the=20
advantages of the change are.

> With regards,
> Reinoud
>=20

> Index: kern/vfs_bio.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
> retrieving revision 1.148
> diff -u -r1.148 vfs_bio.c
> --- kern/vfs_bio.c	24 Dec 2005 19:12:23 -0000	1.148
> +++ kern/vfs_bio.c	3 Jan 2006 16:05:47 -0000
> @@ -904,6 +904,16 @@
>  		wakeup(bp);
>  	}
> =20
> +	/*=20
> +	 * If it's a managed buffer, only recycle struct buf since it doesn't
> +	 * own its data block and should not be cached or reused in the normal
> +	 * way.
> +	 */
> +	if (ISSET(bp->b_flags, B_MANAGED)) {
> +		bp->b_bufsize =3D 0;
> +		goto clearup;
> +	};
> +

Yeah, something like B_EXTERN sounds right. Though there should be a=20
release callback. Just because it's not buffer cache data doesn't mean=20
there shouldn't be any cleanup. :-)

>  	/*
>  	 * Determine which queue the buffer should be on, then put it there.
>  	 */
> @@ -990,6 +1000,7 @@
>  	CLR(bp->b_flags, B_AGE|B_ASYNC|B_BUSY|B_NOCACHE);
>  	SET(bp->b_flags, B_CACHE);
> =20
> +clearup:
>  	/* Allow disk interrupts. */
>  	simple_unlock(&bp->b_interlock);
>  	simple_unlock(&bqueue_slock);
> @@ -1003,6 +1014,85 @@
>  }
> =20
>  /*
> + * Get a new managed buffer. Memory assigned to it is NOT owned by the b=
uffer
> + * cache but by the caller, typically a filingsystem.

file system.

> + */
> +struct buf *
> +getmanagedbuf(int blkno, caddr_t base, int size)
> +{
> +	int s;
> +	struct buf *bp;
> +
> +	KASSERT(base !=3D NULL);
> +	KASSERT(size > 0);
> +
> +	s =3D splbio();
> +	simple_lock(&bqueue_slock);
> +
> +	/* please don't destroy valueable data */
> +	bp =3D pool_get(&bufpool, PR_WAITOK);
> +
> +	/* Initialise buffer */
> +	memset(bp, 0, sizeof(struct buf));
> +	BUF_INIT(bp);
> +	bp->b_flags |=3D B_MANAGED;
> +
> +	bp->b_blkno =3D blkno;
> +	bp->b_data =3D base;
> +	bp->b_bufsize =3D bp->b_bcount =3D bp->b_resid =3D size;
> +
> +	simple_unlock(&bqueue_slock);
> +	splx(s);
> +
> +	return bp;
> +}
> +
> +/*
> + * Get a new buffer nested into the specified buffer at the given offset=
 and
> + * length. NO read/write actions ought to be caried out on the master bu=
ffer
> + * anymore only on the nested buffers as they effectively split up the m=
aster
> + * buffer's action.
> + *
> + * Bug alert: make sure all nested buffers cover the complete mbp->resid
> + * space.  If space is to be skipped, mbp->resid needs to be accounted f=
or or
> + * the biodone on mbp won't be called!
> + */
> +struct buf *
> +nestbuf(struct buf *mbp, int blkno, caddr_t base, int size)
> +{
> +	int s;
> +	struct buf *bp;
> +
> +	KASSERT(base !=3D NULL);
> +	KASSERT(size > 0);
> +	KASSERT((mpb=3D=3DNULL) || (mpb && mbp->b_count < offset+size));
> +
> +	s =3D splbio();
> +	simple_lock(&bqueue_slock);
> +
> +	/* please don't destroy valueable data */
> +	bp =3D pool_get(&bufpool, PR_WAITOK);
> +
> +	/*
> +	 * Adjust relevant information from the master buffer. set nested info
> +	 * and clear callback info and softdep info.
> +	 */
> +	memcpy(bp, mbp, sizeof(struct buf));
> +	bp->b_flags &=3D ~B_CALL;
> +	bp->b_flags |=3D B_MANAGED | B_NESTED;
> +	bp->b_masterbuf =3D mbp;
> +
> +	bp->b_blkno =3D blkno;
> +	bp->b_data =3D base;
> +	bp->b_bufsize =3D bp->b_bcount =3D bp->b_resid =3D size;
> +
> +	/* avoid confusion for softdep? */
> +	LIST_INIT(&bp->b_dep);
> +
> +	return bp;
> +}
> +
> +/*
>   * Determine if a block is in the cache.
>   * Just look on what would be its hash chain.  If it's there, return
>   * a pointer to it, unless it's marked invalid.  If it's marked invalid,
> @@ -1384,6 +1474,8 @@
>  biodone(struct buf *bp)
>  {
>  	int s =3D splbio();
> +	int  isnested;
> +	struct buf *mbp;
> =20
>  	simple_lock(&bp->b_interlock);
>  	if (ISSET(bp->b_flags, B_DONE))
> @@ -1397,6 +1489,18 @@
>  	if (!ISSET(bp->b_flags, B_READ))	/* wake up reader */
>  		vwakeup(bp);
> =20
> +	/* Propagate status to master buffer if its a nested buffer */
> +	isnested =3D ISSET(bp->b_flags, B_NESTED);
> +	mbp =3D bp->b_masterbuf;
> +	if (isnested) {
> +		KASSERT(mbp);
> +		if (bp->b_flags & B_ERROR) {
> +			mbp->b_flags |=3D B_ERROR;
> +			mbp->b_error =3D bp->b_error;
> +		};
> +		mbp->b_resid -=3D bp->b_bcount;
> +	};
> +
>  	/*
>  	 * If necessary, call out.  Unlock the buffer before calling
>  	 * iodone() as the buffer isn't valid any more when it return.
> @@ -1417,6 +1521,12 @@
>  	}
> =20
>  	splx(s);
> +
> +	/* call biodone on master buffer if its completed by this op */
> +	if (isnested) {
> +		if (mbp->b_resid =3D=3D 0)
> +			biodone(mbp);
> +	};
>  }
> =20
>  /*
> Index: sys/buf.h
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvsroot/src/sys/sys/buf.h,v
> retrieving revision 1.84
> diff -u -r1.84 buf.h
> --- sys/buf.h	11 Dec 2005 12:25:20 -0000	1.84
> +++ sys/buf.h	3 Jan 2006 16:05:47 -0000
> @@ -145,6 +145,7 @@
>  	struct	vnode *b_vp;		/* File vnode. */
>  	struct  workhead b_dep;		/* List of filesystem dependencies. */
>  	void	*b_saveaddr;		/* Original b_addr for physio. */
> +	struct buf *b_masterbuf;	/* If nested, points to masterbuf */
> =20
>  	/*
>  	 * private data for owner.
> @@ -213,11 +214,14 @@
>  #define	B_WRITE		0x00000000	/* Write buffer (pseudo flag). */
>  #define	B_XXX		0x02000000	/* Debugging flag. */
>  #define	B_VFLUSH	0x04000000	/* Buffer is being synced. */
> +#define B_MANAGED	0x08000000	/* Buffer's memory is not its own */
> +#define B_NESTED	0x10000000	/* Buffer is not standalone */
> =20
>  #define BUF_FLAGBITS \
>      "\20\1AGE\3ASYNC\4BAD\5BUSY\6SCANNED\7CALL\10DELWRI" \
>      "\11DIRTY\12DONE\13EINTR\14ERROR\15GATHERED\16INVAL\17LOCKED\20NOCAC=
HE" \
> -    "\22CACHE\23PHYS\24RAW\25READ\26TAPE\30WANTED\32XXX\33VFLUSH"
> +    "\22CACHE\23PHYS\24RAW\25READ\26TAPE\30WANTED\32XXX\33VFLUSH\34MANAG=
ED" \
> +    "\35NESTED"
> =20
> =20
>  /*
> @@ -289,6 +293,8 @@
>  struct buf *geteblk(int);
>  struct buf *getnewbuf(int, int, int);
>  struct buf *incore(struct vnode *, daddr_t);
> +struct buf *getmanagedbuf(int, caddr_t, int);
> +struct buf *nestbuf(struct buf *, int, caddr_t, int);
> =20
>  void	minphys(struct buf *);
>  int	physio(void (*)(struct buf *), struct buf *, dev_t, int,

Take care,

Bill

--6WlEvdN9Dv0WHSBl
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFDutj8Wz+3JHUci9cRAqd5AKCQ3t4FGoDOP2hus/ABDJmdL/Xp3QCcDMNP
Z0t9koOM1SMrxxfyiunr3Hk=
=8YfN
-----END PGP SIGNATURE-----

--6WlEvdN9Dv0WHSBl--