Subject: suggested small bufcache and vfs tuning changes
To: None <tech-kern@netbsd.org>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 03/19/2004 11:37:19
--9ZRxqsK4bBEmgNeO
Content-Type: multipart/mixed; boundary="s33OSBZCP+C8M/FY"
Content-Disposition: inline


--s33OSBZCP+C8M/FY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

The attached are a couple of tweaks, for wider consideration:

 - to move buf_drain back to the end of the pagedaemon, and
   corresponding changes to the sizing calcs in the bufcache code
   accordingly. This provides better balance, and helps stop crashing
   the metadata cache down in size too fast.

 - to the vfs code to make dirdelay a small rotor, which seems to
   greatly alleviate the problem of softdep creating a huge storm of
   directory updates all at once, with corresponding bad downstream
   effects.

They come from discussions and testing with tls@ around the recent
bufcache changes and tuning for fallout.  I have been running with
(little variants of) these changes for quite some time.

Between them, they have provided me with a better general feeling of
responsiveness under heavy disk load, and they have successfully
hidden known allocate-to-free problems in softdep and cgd (about to be
fixed).  Even without those problems, smoother io seems to be the
desirable result.

NB, the dirdelay patch is an experiment to prove a point, rather than
a finished solution. In particular, the magic numbers are what seems
to work for my hardware and setup on cgd, and may be thorougly
inappropriate elsewhere. I'd be interested in other people's results.

I'd suggest the bufcache/pagedaemon changes should go in.  They might
go in now for the 2.0 branch, or (since they're so simple) they might
go in after the branch to allow wider testing, as they can be easily
pulled up.

Any thoughts, other than me being lazy about whitespace :)?

--
Dan.
--s33OSBZCP+C8M/FY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff
Content-Transfer-Encoding: quoted-printable

Index: kern/vfs_bio.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
retrieving revision 1.118
diff -u -r1.118 vfs_bio.c
--- kern/vfs_bio.c	22 Feb 2004 01:00:41 -0000	1.118
+++ kern/vfs_bio.c	18 Mar 2004 22:52:18 -0000
@@ -425,22 +425,30 @@
 static int
 buf_canrelease(void)
 {
-	int pagedemand, ninvalid =3D 0;
+	int pagedemand, ninvalid =3D 0, gotnow, giveback;
 	struct buf *bp;
=20
 	LOCK_ASSERT(simple_lock_held(&bqueue_slock));
=20
-	if (bufmem < bufmem_lowater)
+	gotnow =3D bufmem - bufmem_lowater;=20
+
+	if (gotnow < 0)
 		return 0;
=20
 	TAILQ_FOREACH(bp, &bufqueues[BQ_AGE], b_freelist)
 		ninvalid +=3D bp->b_bufsize;
=20
 	pagedemand =3D uvmexp.freetarg - uvmexp.free;
-	if (pagedemand < 0)
-		return ninvalid;
-	return MAX(ninvalid, MIN(2 * MAXBSIZE,
-	    MIN((bufmem - bufmem_lowater) / 16, pagedemand * PAGE_SIZE)));
+
+	giveback =3D MAX(ninvalid,=20
+		       MAX(2 * MAXBSIZE,
+			   MAX(gotnow / 16, pagedemand * PAGE_SIZE)));
+
+	if (giveback > gotnow) {
+		giveback =3D gotnow;
+	}
+
+	return giveback;
 }
=20
 /*
Index: kern/vfs_subr.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
retrieving revision 1.216
diff -u -r1.216 vfs_subr.c
--- kern/vfs_subr.c	14 Feb 2004 00:00:56 -0000	1.216
+++ kern/vfs_subr.c	18 Mar 2004 22:52:19 -0000
@@ -1006,6 +1006,8 @@
 	struct buflists *listheadp;
 	int delay;
=20
+	static unsigned int dlinc =3D 0;
+
 	/*
 	 * Delete from old vnode list, if on one.
 	 */
@@ -1028,7 +1030,8 @@
 		if ((newvp->v_flag & VONWORKLST) =3D=3D 0) {
 			switch (newvp->v_type) {
 			case VDIR:
-				delay =3D dirdelay;
+			  /*				delay =3D dirdelay; */
+			  delay =3D 10 + ((dlinc++/8192)%16);
 				break;
 			case VBLK:
 				if (newvp->v_specmountpoint !=3D NULL) {
Index: uvm/uvm_pdaemon.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/src/sys/uvm/uvm_pdaemon.c,v
retrieving revision 1.58
diff -u -r1.58 uvm_pdaemon.c
--- uvm/uvm_pdaemon.c	30 Jan 2004 11:32:16 -0000	1.58
+++ uvm/uvm_pdaemon.c	18 Mar 2004 22:52:21 -0000
@@ -226,13 +226,6 @@
 		UVMHIST_LOG(pdhist,"  <<WOKE UP>>",0,0,0,0);
=20
 		/*
-		 * The metadata cache drainer knows about uvmexp.free
-		 * and uvmexp.freetarg.  We call it _before_ scanning
-		 * so that it sees the amount we really want.
-		 */
-		buf_drain(0);
-
-		/*
 		 * now lock page queues and recompute inactive count
 		 */
=20
@@ -283,6 +276,13 @@
 		pool_drain(0);
=20
 		/*
+		 * The metadata cache drainer knows about uvmexp.free
+		 * and uvmexp.freetarg. It should give us back at least as much
+		 * as we need, if it can.
+		 */
+		buf_drain(0);
+
+		/*
 		 * free any cached u-areas we don't need
 		 */
 		uvm_uarea_drain(TRUE);

--s33OSBZCP+C8M/FY--

--9ZRxqsK4bBEmgNeO
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (NetBSD)

iD8DBQFAWkC/EAVxvV4N66cRApI7AJkBUc/krPSb/ss4O4kKhR9W3oyLAACeJZ5d
+yzPsV5fdMkBwMRh7Wpdx5I=
=K55u
-----END PGP SIGNATURE-----

--9ZRxqsK4bBEmgNeO--