Subject: Re: tmpfs memory leak?
To: None <>
From: Bill Stouder-Studenmund <>
List: tech-kern
Date: 10/27/2007 17:20:04
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Oct 26, 2007 at 07:15:04PM -0500, David Young wrote:
> On Tue, Oct 23, 2007 at 08:47:43PM -0700, Bill Stouder-Studenmund wrote:
> > On Tue, Oct 23, 2007 at 03:16:43PM -0500, David Young wrote:
> > >=20
> > > Now the nullfs will not "cache" references to nodes in the underlying
> > > filesystem, so nullfs may slow a bit.  I think that if we can plug a
> > > memory leak at the cost of performance, we should. :-)
> > >=20
> > > I would like to check this in, soon.
> >=20
> > Please don't. Well, the part where you use an explicit struct mount * i=
> > fine.
> The explicit struct mount * looked essential to me.  Here is my
> interpretation; tell me if I got it wrong.  The comments in layer_bypass()
> say that the lower filesystem may vput() some of its vnode arguments,
> so layer_bypass() needs to VREF() any vnode that it does not want to go
> away when it calls the lower.  So, after layer_bypass() has vput() a vnode
> itself, it should not extract a struct mount * from it.  I believe we only
> ever got away with it, before, because nullfs was so lazy about releasing
> any lower vnode, ever.  After I patched layer_inactive(), my test machine
> crashed shortly after boot when it dereferenced a NULL struct mount *.

Ahhh... Point taken.

> > You have no quantification of the performance hit. Also, you are in a=
> > situation where memory issues are more important for you. We have no=20
> > indication that all layer users are in this situation, and I believe a=
> > number aren't.
> You're right, I don't care too much about it in my app.  I am not
> especially interested in measuring filesystem performance at this time.
> I don't know what benchmarks the user community likes to see.
> > Ok, I have half-done a patch to do this. I'll post what I come up w/=20
> > tomorrow. I see how to fix this and get most of the way done. What I do=
> > see is how to handle locking.
> Tomorrow? :-)

Yeah, sorry. Real life and the WoW Haloween events got in the way.

Also, I was about half-way through the patch and then realized what I had=
needed some work. The problem is getting a consistent snapshot of all of=20
the vnodes above a node. My first take was to walk the list of vnodes in a=
"safe" manner (it's a TAILQ, so take TAILQ_NEXT() before fussing with a=20
given node). The problem is that something else could happen and whatever=
"next" vnode we look at might not be valid by the time we get around to=20
looking at it. Ick!

So what I have to do is go back and vget() all of the vnodes into an array=
(either on-stack or malloc'd). Then start walking them for=20

I am a little concerned about calling vgone() or vreclaim() in the=20
inactive routine. Mainly as with this upcalling code, we vget() the vnodes=
then vrele() them. The vrele() could well be the one that puts them on the=
free list and triggers VOP_INACTIVE() and thus blowing them away. We need=
to keep the vnode stack locked through this, and that makes it all messy.

I think the problem is we never thought about doing a vrele() on a locked=
vnode stack that drops the last reference on an upper vnode. We always=20
assume that whenever we drop the last reference, we are happy with the=20
stack being unlocked at the end. That's been true until now.

I think what we want is something akin to the rapid age code MacOS has.=20
Basically when rapid aging is in effect (either for a thread or I think=20
you can set it on a vnode), when you put it on the free list, you put it=20
on the front. Not the back. The upshot here would be that we wouldn't have=
the recycling happening in the same context that's making the upcall.

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.4.7 (NetBSD)