tech-kern: Re: tmpfs memory leak?

Subject: Re: tmpfs memory leak?
To: None <tech-kern@NetBSD.org>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 10/23/2007 20:47:43
--J/dobhs11T7y2rNN
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 23, 2007 at 03:16:43PM -0500, David Young wrote:
> On Mon, Oct 22, 2007 at 02:56:36PM -0500, David Young wrote:
> > I have an 8M tmpfs filesystem whose available blocks are exhausted over
> > time.  There seems to be a memory leak, because I cannot account for mo=
re
> > than 3M of the use with fstat(1) (for deleted files & dirs) and du(1),
> >=20
> *snip snip*
>=20
> Inactive nodes belonging to the nullfs held references to vnodes in
> the underlying tmpfs.  As long as the nullfs held the references,
> the kernel would never call VOP_INACTIVE() and VOP_RECLAIM() on the
> tmpfs nodes to reclaim their resources.  It was possible to force the
> kernel to VOP_RECLAIM() the inactive nullfs vnodes using 'sysctl -w
> kern.maxvnodes=3D1'.  The tmpfs collected its nodes as the nullfs dropped
> its references, and df(1) indicated that the resources held by tmpfs
> dropped to reasonable levels.
>=20
> I have attached a patch that solves the problem by calling vrecycle(9)
> unconditionally from layer_inactive().  I ran the patch for several hours,
> and I did not see my tmpfs balloon like it had previously.  The number
> of nodes and blocks in use have reached the steady state that I expect
> in my application.
>=20
> Now the nullfs will not "cache" references to nodes in the underlying
> filesystem, so nullfs may slow a bit.  I think that if we can plug a
> memory leak at the cost of performance, we should. :-)
>=20
> I would like to check this in, soon.

Please don't. Well, the part where you use an explicit struct mount * is=20
fine.

You have no quantification of the performance hit. Also, you are in a=20
situation where memory issues are more important for you. We have no=20
indication that all layer users are in this situation, and I believe a=20
number aren't.

I also object because deleting a file is a rare event in terms of VFS=20
(since you do many operations on a given file, but remove it only once).=20
Yet you're addressing this rare issue with code in the fast path. That=20
strikes me as really really wrong. :-)

However I have something else to suggest. We've talked about "the right"=20
answer, and I think it may be time to do it.

The "right" answer is for us to know when there's a removal event on the=20
underlying node. When that happens, we need to tell all upper nodes to=20
vgone in their inactive calls(*). Oh, why didn't you just vgone() all the=
=20
time?

To do this, we need VOP_UPCALL() its prototype is:

VOP_UPCALL(struct vnode *vp, struct vnode *lower_vp, int op, intptr_t=20
cookie);

The idea is we lock the vnode stack, then call VOP_UPCALL() on all of the=
=20
vnodes above us. We do this in the remove (in the leaf layer). Well, we=20
lock the stack and call an itterator in vfs_subr.c. :-)


Besides satisfying my aestetic idea of how it should be done, this also=20
helps w/ another issue we have, which is figuring out when we get the=20
"last" close on a file. The problem is we determine if we have the "last"=
=20
close by looking at the active count on a vnode. Problem is that if we=20
have a layered stack, we really want all the references to vnodes in the=20
stack from things not in the stack. So the references each layer has on=20
lower layers don't count.

Ok, I have half-done a patch to do this. I'll post what I come up w/=20
tomorrow. I see how to fix this and get most of the way done. What I don't=
=20
see is how to handle locking.

Take care,

Bill

--J/dobhs11T7y2rNN
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)

iD8DBQFHHsBfWz+3JHUci9cRAjoeAJ9rsxuLG4maYpefCwvXSiwZBHPUswCgkWsz
bxcbhJmKGA8Ej50xLOjZtuI=
=rB1l
-----END PGP SIGNATURE-----

--J/dobhs11T7y2rNN--