Subject: Re: kern/32535: processes stuck on vnlock
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Bill Studenmund <wrstuden@netbsd.org>
List: netbsd-bugs
Date: 09/26/2006 01:35:02
The following reply was made to PR kern/32535; it has been noted by GNATS.

From: Bill Studenmund <wrstuden@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/32535: processes stuck on vnlock
Date: Mon, 25 Sep 2006 18:32:15 -0700

 --at6+YcpfzWZg/htY
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 Ok, here is some more analysis on the problem.
 
 The main issue is that vrele() can lock the vnode before calling=20
 VOP_INACTIVE(). Since we are calling vrele() on the parrent and we have=20
 hte child locked, this locking violates the vnode locking hierarchy; you=20
 can't lock a vnode's parent while holding the vnode's lock.
 
 I see three ways to fix this:
 
 1) Do something along the lines of what I think Chuck was talking about,=20
 and create a work queue to handle destroying vnodes. The trick, though, is=
 =20
 that VOP_INACTIVE() isn't necessarily about destroying a vnode, it's=20
 telling the file system that the vnode is going on the free list. The main=
 =20
 user of this information (AFAIK) is NFS, which will zap a node if the=20
 silly-rename code has been triggered.
 
 We _could_ have a special worker thread do handle the VOP_INACTIVE
 calling, however this will happen every time a vnode gets put on the free
 list! Even if we added a flag so that we only did this processing if
 requested (i.e. some vnodes skipped calling VOP_INACTIVE()), we still have
 this weird case where we have the free list we have now, and we have a
 "freeing" list.
 
 My main concern is a case where a file system uses VOP_INACTIVE as an=20
 indication that it can release resources. I expect such a file system will=
 =20
 want VOP_INACTIVE calls, and this change will result in a performance hit.
 
 2) We could re-work lookup() so that we don't release the directory lock=20
 if we're locking the child and we instead call vput(). I think this is the=
 =20
 best option as it gets rid of the real problem. I'll look into it, but=20
 if someone else wants to look into this, please do! I'm not sure how=20
 quickly I can look at it.
 
 3) We add a vrele2() call that takes the vnode on which we're calling=20
 vrele() and another vnode that we have locked. It would process just like=
 =20
 vrele(), except instead of just locking the vnode, it tries to get the=20
 lock. If it can, it proceeds. If it can't, it releases the other vnode=20
 then blocks waiting for the first vnode's lock.
 
 Option 3 probably is the easiest solution. But I'm not sure what I think=20
 of it.
 
 Actually, if we did this as a locking primitive (vlock_with_held()), we=20
 could use it for ".." lookup too. It's the same issue as lookup on ".."...=
 =20
 :-)
 
 Take care,
 
 Bill
 
 
 
 
 ----- End forwarded message -----
 
 --at6+YcpfzWZg/htY
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.3 (NetBSD)
 
 iD8DBQFFGIMfWz+3JHUci9cRAr1yAJ4lewHllTGPf12psspc8AG7Xky8UACeOOFf
 0Pitq/iiaYY3ivAQ+Her2a8=
 =zEFo
 -----END PGP SIGNATURE-----
 
 --at6+YcpfzWZg/htY--