Subject: Re: vm.bufmem_hiwater not honored (found trigger)
To: Arto Selonen <arto+dated+1100946799.b55473dbc43f5917@selonen.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 11/21/2004 17:01:19
On Sat, Nov 20, 2004 at 12:32:13PM +0200, Arto Selonen wrote:
> Hi!
>
> On Mon, 15 Nov 2004, Thor Lancelot Simon wrote:
>
> > size, I don't understand how your system got into the situation it is
> > in, at all.
> >
> > And I would very much like to.
>
> The vm.bufmem growth is triggered by /etc/daily find_core routine.
> I have three directory structures (as separate file systems) that
> hold some amount of files/data:
>
> /squid # disk cache of squid
> /cvs # anoncvs sources (src,xsrc,pkgsrc)
> /obj # build destination
Okay, I think I understand what's going on. Either one of your
filesystems has a larger blocksize than the others, or the average
directory in one takes up one or more blocks while in another it
takes up only a frag.
Either way, what happens is that the first filesystem's worth of
metadata takes you up to the high-water mark, and then, since you're
there already, vfs_bio ends up resizing existing buffers (always growing)
instead of allocating new ones -- so every buffer it touches grows. Once
you're above the high-water mark, the canrelease call in allocbuf does
the wrong thing and you don't shrink any more.
Try this:
Index: vfs_bio.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
retrieving revision 1.122.2.4
diff -c -r1.122.2.4 vfs_bio.c
*** vfs_bio.c 8 Oct 2004 03:25:15 -0000 1.122.2.4
--- vfs_bio.c 21 Nov 2004 21:56:37 -0000
***************
*** 462,467 ****
--- 462,470 ----
if (bufmem < bufmem_lowater)
return 0;
+ if (bufmem > bufmem_hiwater)
+ return bufmem - bufmem_hiwater;
+
TAILQ_FOREACH(bp, &bufqueues[BQ_AGE], b_freelist)
ninvalid += bp->b_bufsize;