Subject: Re: new memory allocation scheme and disk access
To: None <current-users@netbsd.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 01/03/2004 14:17:47
On Sat, Jan 03, 2004 at 01:20:48PM +0100, Lennart Augustsson wrote:
> So it's good that the problem can be fixed, but the fix makes me worry.
> 
> First, I assume increasing maxvnodes wastes some resources?  So the new
> scheme is more resource hungry than the old?

No, it doesn't "waste" resources.  What you're seeing is that there is,
essentially, 2-layer caching of directory data: in the namei (vnode)
cache, and in the metadata (buffer) cache itself.  When the namei cache
is too small -- which is common -- the buffer cache ends up capturing the
I/O that is issued to pull directories back in to the cache.

The old allocation scheme, which never really freed memory, also had
the property that it tended to maintain a more static allocation of
metadata buffers to directories vs. other metadata.  If it could actually
provide enough entries to capture the working set of directories of the
filesystem, this could allow the namei cache to be radically undersized
with little obvious performance penalty.  Unfortunately, the massive
waste of resources inherent in the old scheme used *far* more memory --
and permanently-reserved kernel virtual address space, at that -- than
sizing the namei cache properly would.  Because the new scheme is closer
to true LRU than the old scheme, buffers tend to get recycled to cache
non-directory blocks, and then when the namei cache thrashes... you
get the idea.  The old scheme was pseudo-generational in a way that
evidently avoided this pathological case better, but so will sizing
the namei cache right!

Even with the new scheme, minimum-size buffers are at least one filesystem
fragment -- a namei cache entry should generally be quite a bit smaller
than 512 bytes.  So sizing the namei cache properly should not "waste"
memory; in fact, capturing I/O in the namei cache rather than in the buffer
cache where possible should *save* memory.  With the enormous reduction
in KVA pressure on single-address-space 32-bit platforms, the new code
should allow the namei cache to easily be far, far larger than it used to
be.

> Second, the default value should be changed, I presume, to make up for
> the change.

Actually, I think the namei cache should resize itself according to some
simple heuristic.  This would reduce buffer cache pressure greatly as
well, given the current scheme, I think.

Thor