Subject: Re: CVS commit: syssrc/sys/miscfs/nullfs
To: enami tsugutomo <enami@sm.sony.co.jp>
From: Bill Studenmund <wrstuden@netbsd.org>
List: source-changes
Date: 03/06/2002 13:17:52
On 6 Mar 2002, enami tsugutomo wrote:

> Bill Studenmund <wrstuden@netbsd.org> writes:
>
> > I agree that something other than 16 is probably good, but I don't think
> > this is it. Do we have any performance data saying 16 isn't good enough?
>
> When I first noticed this, I saw system sometimes eats 90% of time in
> kernel mode and there was 10000 nodes on a same slot (when
> desiredvnodes is 48000 and I was running cvs update on pkgsrc).

10000 on the same slot, when we have one slot per vnode in the system?!?!

> I did simple test again today.  With 16, system spends 2 or 3 times in
> kernel mode.

[snip] Ok. 16 is too small.

> > First, since each layer vnode has a lower vnode, we will have at
> > most half the system vnodes in the hash table (we are hashing the
> > lower vnode).
>
> Ya, after I commited that I once thought to change it to /2 or /4, but
> recently I tend to think how worth the (current implementation of)
> layer node cache is (while there is some other side effects like
> PR#15555).

15555 is a seperate problem. It is the same problem as a previous pr
(forgot the number) where deleted files don't disapear under a umap fs
when the lower fs is nfs. To fix it means changing the vnode interface (we
need to add a way for upper vnodes to register themselves, and for lower
ones to send info to upper ones). vnodes in general are expensive to
build. I think it's good for us to keep them around.

This problem is just a cache/hash problem. I think we can fix it (or break
it :-) independantly of the other problem. The first thing that comes to
my mind is to make sure that LOG2_SIZEVNODE is still right; I'm not sure
if it's been adjusted since UBC (and the addition of the struct uvm_obj at
the front of the struct). That being wrong will certainly make the hash
not work right.

Oh, the problem with your proposed patch for 1555 is what happens when we
have multiple layers. Say we have a umap on top of an overlay (or more
likely, a umap on top of a dmfs layer). We have done an ls in the umap
layer, so we have an ffs node referenced by a dmfs node referenced by the
umap layer. Now say we remove the file in the dmfs or overlay layer. Both
now and with your patch, the file is still around. If we had an nfs layer
underneath, we'd see the .nfs#### file.

The problem is that the umap layer is still just sitting there doing
nothing, yet hanging onto the node. The dmfs or overlay layer will vgone
it immediatly on inactivation with your patch, but that's not happening
any faster than now.

With an upcall op, after the remove notices there are no more disk
references, it will tell the dmfs or overlay layer to let go (basically do
what you did, but I have a thought there). The dmfs layer would also tell
the umap layer to let go. When all was said and done, the umap node gets
taken apart (releasing the reference), then the dmfs node gets taken apart
(releasing its reference), then the underlying file is gone.

Oh, my thought on how to deal with "telling to let go" is add a new vnode
flag, and when vrele() and vput() see that flag on a node they are putting
on the free list, they put it on the front, not the back. The main part of
the idea is that we centralize the handling, so that we don't have to
duplicate code in all of the layers. Yes, layerfs would catch most of the
file systems, but we can just put the handling in the vfs code.

Take care,

Bill