Subject: Re: kern/32535: processes stuck on vnlock
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: netbsd-bugs
Date: 10/07/2006 10:15:04
The following reply was made to PR kern/32535; it has been noted by GNATS.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: kern/32535: processes stuck on vnlock
Date: Sat, 7 Oct 2006 12:10:45 +0200

 On Tue, Sep 26, 2006 at 01:35:02AM +0000, Bill Studenmund wrote:
 >  Ok, here is some more analysis on the problem.
 >  
 >  The main issue is that vrele() can lock the vnode before calling=20
 >  VOP_INACTIVE(). Since we are calling vrele() on the parrent and we have=20
 >  hte child locked, this locking violates the vnode locking hierarchy; you=20
 >  can't lock a vnode's parent while holding the vnode's lock.
 >  
 >  I see three ways to fix this:
 >  
 >  1) Do something along the lines of what I think Chuck was talking about,=20
 >  and create a work queue to handle destroying vnodes. The trick, though, is=
 >  =20
 >  that VOP_INACTIVE() isn't necessarily about destroying a vnode, it's=20
 >  telling the file system that the vnode is going on the free list. The main=
 >  =20
 >  user of this information (AFAIK) is NFS, which will zap a node if the=20
 >  silly-rename code has been triggered.
 >  
 >  We _could_ have a special worker thread do handle the VOP_INACTIVE
 >  calling, however this will happen every time a vnode gets put on the free
 >  list! Even if we added a flag so that we only did this processing if
 >  requested (i.e. some vnodes skipped calling VOP_INACTIVE()), we still have
 >  this weird case where we have the free list we have now, and we have a
 >  "freeing" list.
 >  
 >  My main concern is a case where a file system uses VOP_INACTIVE as an=20
 >  indication that it can release resources. I expect such a file system will=
 >  =20
 >  want VOP_INACTIVE calls, and this change will result in a performance hit.
 >  
 >  2) We could re-work lookup() so that we don't release the directory lock=20
 >  if we're locking the child and we instead call vput(). I think this is the=
 >  =20
 >  best option as it gets rid of the real problem. I'll look into it, but=20
 >  if someone else wants to look into this, please do! I'm not sure how=20
 >  quickly I can look at it.
 >  
 >  3) We add a vrele2() call that takes the vnode on which we're calling=20
 >  vrele() and another vnode that we have locked. It would process just like=
 >  =20
 >  vrele(), except instead of just locking the vnode, it tries to get the=20
 >  lock. If it can, it proceeds. If it can't, it releases the other vnode=20
 >  then blocks waiting for the first vnode's lock.
 >  
 >  Option 3 probably is the easiest solution. But I'm not sure what I think=20
 >  of it.
 >  
 >  Actually, if we did this as a locking primitive (vlock_with_held()), we=20
 >  could use it for ".." lookup too. It's the same issue as lookup on ".."...=
 >  =20
 >  :-)
 
 Note that if someone writes a fix for this I'll be happy to try it.
 I have a box which has to be rebooted -n -q periodically because of
 processes stuck on vnlock. One day I'll change the disk system to get
 rid of the null mounts, which will fix the issue :)
 
 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --