Subject: Re: tmpfs memory leak?
To: None <tech-kern@NetBSD.org>
From: David Young <dyoung@pobox.com>
List: tech-kern
Date: 10/26/2007 19:15:04
On Tue, Oct 23, 2007 at 08:47:43PM -0700, Bill Stouder-Studenmund wrote:
> On Tue, Oct 23, 2007 at 03:16:43PM -0500, David Young wrote:
> > On Mon, Oct 22, 2007 at 02:56:36PM -0500, David Young wrote:
> > > I have an 8M tmpfs filesystem whose available blocks are exhausted over
> > > time.  There seems to be a memory leak, because I cannot account for more
> > > than 3M of the use with fstat(1) (for deleted files & dirs) and du(1),
> > > 
> > *snip snip*
> > 
> > Inactive nodes belonging to the nullfs held references to vnodes in
> > the underlying tmpfs.  As long as the nullfs held the references,
> > the kernel would never call VOP_INACTIVE() and VOP_RECLAIM() on the
> > tmpfs nodes to reclaim their resources.  It was possible to force the
> > kernel to VOP_RECLAIM() the inactive nullfs vnodes using 'sysctl -w
> > kern.maxvnodes=1'.  The tmpfs collected its nodes as the nullfs dropped
> > its references, and df(1) indicated that the resources held by tmpfs
> > dropped to reasonable levels.
> > 
> > I have attached a patch that solves the problem by calling vrecycle(9)
> > unconditionally from layer_inactive().  I ran the patch for several hours,
> > and I did not see my tmpfs balloon like it had previously.  The number
> > of nodes and blocks in use have reached the steady state that I expect
> > in my application.
> > 
> > Now the nullfs will not "cache" references to nodes in the underlying
> > filesystem, so nullfs may slow a bit.  I think that if we can plug a
> > memory leak at the cost of performance, we should. :-)
> > 
> > I would like to check this in, soon.
> 
> Please don't. Well, the part where you use an explicit struct mount * is 
> fine.

The explicit struct mount * looked essential to me.  Here is my
interpretation; tell me if I got it wrong.  The comments in layer_bypass()
say that the lower filesystem may vput() some of its vnode arguments,
so layer_bypass() needs to VREF() any vnode that it does not want to go
away when it calls the lower.  So, after layer_bypass() has vput() a vnode
itself, it should not extract a struct mount * from it.  I believe we only
ever got away with it, before, because nullfs was so lazy about releasing
any lower vnode, ever.  After I patched layer_inactive(), my test machine
crashed shortly after boot when it dereferenced a NULL struct mount *.

> You have no quantification of the performance hit. Also, you are in a 
> situation where memory issues are more important for you. We have no 
> indication that all layer users are in this situation, and I believe a 
> number aren't.

You're right, I don't care too much about it in my app.  I am not
especially interested in measuring filesystem performance at this time.
I don't know what benchmarks the user community likes to see.

> Ok, I have half-done a patch to do this. I'll post what I come up w/ 
> tomorrow. I see how to fix this and get most of the way done. What I don't 
> see is how to handle locking.

Tomorrow? :-)

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933 ext 24