Subject: Re: bug kern/5026
To: Frank van der Linden <frank@wins.uva.nl>
From: Greg Wohletz <greg@lonnie.egr.unlv.edu>
List: current-users
Date: 05/07/1998 12:17:20
OK, I've made some progress in tracking down what is going on with this
panic. Here is the code segment that triggers the panic (from nfs_serv.c,
near the end of the nfsrv_rename routine):
vrele(tond.ni_startdir);
FREE(tond.ni_cnd.cn_pnbuf, M_NAMEI);
out1:
if (fdirp) {
fdiraft_ret = VOP_GETATTR(fdirp, &fdiraft, cred, procp);
vrele(fdirp);
}
if (tdirp) {
tdiraft_ret = VOP_GETATTR(tdirp, &tdiraft, cred, procp);
vrele(tdirp);
}
vrele(fromnd.ni_startdir); <--------- this call triggers the panic
FREE(fromnd.ni_cnd.cn_pnbuf, M_NAMEI);
nfsm_reply(2 * NFSX_WCCDATA(v3));
Now I noted from the crash dumps that tond.ni_startdir was always equal
to fromnd.ni_startdir when the crash occured, so I placed the following
debuging code right in from of the 1st vrele call:
if(tond.ni_startdir == fromnd.ni_startdir) {
Error_refcnt2 = fromnd.ni_startdir->v_usecount;
} else {
Error_refcnt2 = -999;
}
Then when the system paniced the next time I inspected the value of
Error_refcnt2, and sure enough it was 1. Clearly if the kernel gets
to the 1st vrele with those two pointers equal and the ref count
set to 1 a panic is inevitable since vrele is about to be called
twice on that vnode.
Now the question is how does the kernel get into this state. Hopefully
what I have discovered will help someone to find the cause, meanwhile
I will continue to stumble forward as best I can with my limited
understanding of the inner workings of the vnode code.
For anyone that is interested I have placed the latest crash dump in
http://www.cs.unlv.edu/~greg/netbsd/
That directory contains several crash dumps and kernels. Dump #9 is the
one that was generated after I inserted the debugging code. nfs_serv.c is
copy of that code with my debugs in it so that gdb line numbers will make
sense to anyone who wants to look at them.
--Greg