Subject: PR 7954, one last nullfs problem.
To: None <tech-kern@netbsd.org>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 07/12/1999 12:00:05
Alan reported a nullfs bug which, on inspection, has been in the layered
code for ever.

Short description: layered vnodes can legitematly be on the free list and
locked. If a process locks the underlying vnode, attempts to lock the
upper one will deadlock even though the vnode is not in use.

Long description. Here's the scenario:

nullfs mount at		/usr  from /b/usr
ffs mount at          /b/usr

Now start two ls -lR's, one in /usr and one in /b/usr.

The ls -lR in /usr will generate lots of vnodes, two per accessed file. 
As the nullfs vnodes contain references to the ffs vnodes, none of the
unused ffs nodes show up on the free list. Only nullfs ones. Consider one
pair which happens to be a directory: "A" is in the upper layer (nullfs),
and "a" is in the lower layer (ffs). "A" will show up on the free list.
Eventually it will get to the head of it.

The scenario for this pr is where ls -lR on /b/usr looks up something in
dir "a" which hasn't been looked up yet, so it has to get a new vnode. "A"
is now at the head of the list, so getnewvnode wants to clean it. It grabs
it off the free list, and feeds it to vclean, which wants to drain the
lock. As the lookup which called vclean has the lock, we obviously won't
drain. :-) Boom!


My thought on how to deal with this is to have getnewvnode do a
VOP_ISLOCKED to see if the vnode's locked before choosing it, ignoring it
if it's not.

So that we don't thrash performace, I suggest we add a new vnode flag,
VLAYER, which indicates that the vnode is from a layered filesystem.
getnewvnode would then only do the islocked test if VLAYER was set.

Thoughts?

Take care,

Bill