Subject: Re: vm.bufmem_hiwater not honored (found trigger)
To: Arto Selonen <arto+dated+1100946799.b55473dbc43f5917@selonen.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 11/21/2004 17:01:19
On Sat, Nov 20, 2004 at 12:32:13PM +0200, Arto Selonen wrote:
> Hi!
> 
> On Mon, 15 Nov 2004, Thor Lancelot Simon wrote:
> 
> > size, I don't understand how your system got into the situation it is
> > in, at all.
> >
> > And I would very much like to.
> 
> The vm.bufmem growth is triggered by /etc/daily find_core routine.
> I have three directory structures (as separate file systems) that
> hold some amount of files/data:
> 
> 	/squid		# disk cache of squid
> 	/cvs		# anoncvs sources (src,xsrc,pkgsrc)
> 	/obj		# build destination

Okay, I think I understand what's going on.  Either one of your
filesystems has a larger blocksize than the others, or the average
directory in one takes up one or more blocks while in another it
takes up only a frag.

Either way, what happens is that the first filesystem's worth of
metadata takes you up to the high-water mark, and then, since you're
there already, vfs_bio ends up resizing existing buffers (always growing)
instead of allocating new ones -- so every buffer it touches grows.  Once
you're above the high-water mark, the canrelease call in allocbuf does
the wrong thing and you don't shrink any more.

Try this:

Index: vfs_bio.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
retrieving revision 1.122.2.4
diff -c -r1.122.2.4 vfs_bio.c
*** vfs_bio.c	8 Oct 2004 03:25:15 -0000	1.122.2.4
--- vfs_bio.c	21 Nov 2004 21:56:37 -0000
***************
*** 462,467 ****
--- 462,470 ----
  	if (bufmem < bufmem_lowater)
  		return 0;
  
+ 	if (bufmem > bufmem_hiwater)
+ 		return bufmem - bufmem_hiwater;
+ 
  	TAILQ_FOREACH(bp, &bufqueues[BQ_AGE], b_freelist)
  		ninvalid += bp->b_bufsize;