tech-kern: Re: tmpfs memory leak?

Subject: Re: tmpfs memory leak?
To: None <tech-kern@NetBSD.org>
From: David Young <dyoung@pobox.com>
List: tech-kern
Date: 10/23/2007 15:16:43
--6TrnltStXW4iwmi0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Mon, Oct 22, 2007 at 02:56:36PM -0500, David Young wrote:
> I have an 8M tmpfs filesystem whose available blocks are exhausted over
> time.  There seems to be a memory leak, because I cannot account for more
> than 3M of the use with fstat(1) (for deleted files & dirs) and du(1),
> 
*snip snip*
> 
> Does no one else see this?  My application may be a bit unusual, both
> in that I use null mounts, and in that I have no swap activated.
> 
> Could the cause of the leak be an interaction between my null mounts
> and tmpfs?  Also, I am dimly aware of some reference-counting bug in
> tmpfs; it was mentioned in one of ad@'s commits to the vmlocking branch.
> (I do not run the vmlocking branch.)

Inactive nodes belonging to the nullfs held references to vnodes in
the underlying tmpfs.  As long as the nullfs held the references,
the kernel would never call VOP_INACTIVE() and VOP_RECLAIM() on the
tmpfs nodes to reclaim their resources.  It was possible to force the
kernel to VOP_RECLAIM() the inactive nullfs vnodes using 'sysctl -w
kern.maxvnodes=1'.  The tmpfs collected its nodes as the nullfs dropped
its references, and df(1) indicated that the resources held by tmpfs
dropped to reasonable levels.

I have attached a patch that solves the problem by calling vrecycle(9)
unconditionally from layer_inactive().  I ran the patch for several hours,
and I did not see my tmpfs balloon like it had previously.  The number
of nodes and blocks in use have reached the steady state that I expect
in my application.

Now the nullfs will not "cache" references to nodes in the underlying
filesystem, so nullfs may slow a bit.  I think that if we can plug a
memory leak at the cost of performance, we should. :-)

I would like to check this in, soon.

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933 ext 24

--6TrnltStXW4iwmi0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="layer_vnops.c.patch"

Index: layer_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/miscfs/genfs/layer_vnops.c,v
retrieving revision 1.31
diff -p -u -u -p -r1.31 layer_vnops.c
--- layer_vnops.c	16 Apr 2007 08:10:58 -0000	1.31
+++ layer_vnops.c	23 Oct 2007 19:56:05 -0000
@@ -296,6 +296,7 @@ layer_bypass(v)
 	struct vnode *old_vps[VDESC_MAX_VPS], *vp0;
 	struct vnode **vps_p[VDESC_MAX_VPS];
 	struct vnode ***vppp;
+	struct mount *mp;
 	struct vnodeop_desc *descp = ap->a_desc;
 	int reles, i, flags;
 
@@ -311,7 +312,8 @@ layer_bypass(v)
 	vps_p[0] =
 	    VOPARG_OFFSETTO(struct vnode**, descp->vdesc_vp_offsets[0], ap);
 	vp0 = *vps_p[0];
-	flags = MOUNTTOLAYERMOUNT(vp0->v_mount)->layerm_flags;
+	mp = vp0->v_mount;
+	flags = MOUNTTOLAYERMOUNT(mp)->layerm_flags;
 	our_vnodeop_p = vp0->v_op;
 
 	if (flags & LAYERFS_MBYPASSDEBUG)
@@ -401,7 +403,7 @@ layer_bypass(v)
 		 * as a lookup on "." would generate a locking error.
 		 * So all the calls which get us here have a locked vpp. :-)
 		 */
-		error = layer_node_create(old_vps[0]->v_mount, **vppp, *vppp);
+		error = layer_node_create(mp, **vppp, *vppp);
 		if (error) {
 			vput(**vppp);
 			**vppp = NULL;
@@ -741,6 +743,7 @@ layer_inactive(v)
 		struct lwp *a_l;
 	} */ *ap = v;
 	struct vnode *vp = ap->a_vp;
+	struct lwp *l = ap->a_l;
 
 	/*
 	 * Do nothing (and _don't_ bypass).
@@ -760,9 +763,7 @@ layer_inactive(v)
 	 * ..., but don't cache the device node. Also, if we did a
 	 * remove, don't cache the node.
 	 */
-	if (vp->v_type == VBLK || vp->v_type == VCHR
-	    || (VTOLAYER(vp)->layer_flags & LAYERFS_REMOVED))
-		vgone(vp);
+	vrecycle(vp, NULL, l);
 	return (0);
 }
 

--6TrnltStXW4iwmi0--