tech-kern: kern/8491

Subject: kern/8491
To: None <gnats-bugs@netbsd.org, tech-kern@netbsd.org,>
From: Ethan Solomita <ethan@geocast.com>
List: tech-kern
Date: 11/12/1999 18:22:23
	We've been seeing a similar problem to kern/8491 here, although
admittedly the usecount < 0 could stem from multiple bugs. Using the
arpwatch suggestion, I was also able to reliably crash NetBSD, and I did
some probing. I've got a lot of information, but I think it'll need
someone more familiar with the namei/lookup/relookup set of interfaces
to decide on the exact cause.

	When two arpwatch processes are killed, they both simultaneously try to
rename the same file to the same new filename. The final problem seems
to be that one thread, in stage 3 of ufs_rename, calls relookup(), which
doesn't find the original file anymore, and returns an error.
Unfortunately, the vput(dp); at the end of relookup() frees a usecount
reference that isn't allocated.

	But back to basics, here's some trace information on what the two
threads were actually doing, in terms of modifications to usecount. I
added a trap before all pieces of code which modify usecount, and did a
trace each time. Here's the results, starting from when I send the kill
signal to both processes:

vref  1 ++ 3 open namei/vref
vref  1 ++ 4 open namei/vref
vput  1 -- 3 open/create makeinode/vput
vput  1 -- 2 open vput

The above is pretty normal. Below is where the rename calls start. I've
arbitrarily labeled one process "0" and the other "1". Next comes the
routine (eg. vget) which is modifying the usecount, then ++ (increment)
or -- (decrement), then the new usecount. Finally is the summary of the
backtrace, starting from the descendant of rename_files().

1: vref  ++ 3 rename namei/vref
1: vref  ++ 4 rename namei/lookup/vref
1: vref  ++ 5 rename namei/vref
1: vref  ++ 6 rename namei/lookup/vref
1: vrele -- 5 rename ufs_rename/vrele ufs_rename+0x454
0: vref  ++ 6 rename namei/vref
1: vput  -- 5 rename ufs_rename/vput ufs_rename+0xa35
0: vref  ++ 6 rename namei/lookup/vref
0: vref  ++ 7 rename namei/vref
0: vref  ++ 8 rename namei/lookup/vref
0: vput  -- 7 rename ufs_rename/vput ufs_rename+0x228
0: vrele -- 6 rename ufs_rename/vrele ufs_rename+0x23f
0: vref  ++ 7 rename ufs_rename/relookup/vref ufs_rename+0x28c
0: vput  -- 6 rename ufs_rename/ufs_remove/vput ufs_rename+0x2d8
0: vrele -- 5 rename vrele
0: vrele -- 4 rename vrele
1: vput  -- 3 rename ufs_rename/relookup/vput ufs_rename+0xa7e
1: vrele -- 2 rename vrele
1: vrele -- 1 rename vrele

There's another set of rename requests below, but the damage is already
done above, and the below renames seems to retain usecount balance.
Finally there's the cwdfree failure at the end:

vref  1 ++ 2 rename
vref  1 ++ 3 rename
vref  1 ++ 4 rename
vref  1 ++ 5 rename
vrele 1 -- 4 rename
vref  1 ++ 5 rename
vput  1 -- 4 rename
vref  1 ++ 5 rename
vput  1 -- 4 rename
vrele 1 -- 3 rename
vrele 1 -- 2 rename
vput  1 -- 1 rename
vrele 1 -- 0 exit/cwdfree
vrele 1 -- -1 exit/cwdfree

In both cases, the processes have chdir'd to its private subdirectory
and appear to be renaming files in that subdirectory, ie. lookup isn't
traversing any subdirectories. The vnode being traced above, and causing
the panic, is this directory.

In my trace summaries I break out ufs_rename in detail. The rest looked
straightforward. Since the vnode being referenced is both the parentdir
and the dvp (direct parent) to the file being renamed, it gets four
references initially by each thread, since both the from and to files in
the rename are in the same subdirectory.

It seems that, normally, ufs_rename frees two of the references, and
rename_files() directly frees the other two. When thread 0 goes through
ufs_rename() in my traces, when it calls relookup at stage three, it
called VOP_LOOKUP(), got an error, and has called VREF() from within
that if-statement. That extra reference is freed by ufs_remove
immediately after:

0: vref  ++ 7 rename ufs_rename/relookup/vref ufs_rename+0x28c
0: vput  -- 6 rename ufs_rename/ufs_remove/vput ufs_rename+0x2d8

However thread 1 ends up at the bad: label at the end of relookup()
(called from stage 3 of ufs_rename). Here it seems that the bug is. It
calls vput(dp); (right after the bad: label in relookup), which drops
usecount on the vnode. This is the extra decrement on the use-count,
since ufs_rename has already dropped its reference before calling
relookup.

So here's where I'm confused: ufs_rename, fairly early in the code,
calls vrele(fdvp), thus dropping its reference to the
from-vnode's-parent's usecount. Then at "3) Unlink the source." it calls
relookup() to regain access to the from-vnode. And relookup() proceeds
to call vput() on the from-vnode's-parent in the bad: case, even though
the parent doesn't have a reference anymore.

So does relookup() expect you to have a hold on the vnode-parent's
usecount? If so, then ufs_rename() has the bug. If not, then relookup()
is bad. I'm looking at relookup(), and I honestly can't figure out what
it expects. If VOP_LOOKUP returns an error, then it doesn't change the
usecount (ignoring SAVESTART). If VOP_LOOKUP returns success, and we
don't return an error, then we decrement the usecount unless
"wantparent" is specified. And in the case where we return an error at
bad: we always decrement the usecount, no matter what.

OK, this is certainly long enough now! 8)  I certainly hope this is
enough information to get this bug fixed quickly by someone who is
familiar with the relevant code. If you've got any questions about my
ramblings here, just ask.

	-- Ethan