Subject: Re: vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup()
To: Greg Troxel <gdt@ir.bbn.com>
From: Jaromir Dolecek <jdolecek@netbsd.org>
List: tech-kern
Date: 03/16/2003 21:50:45
Locking rules for symlink vnode op have changed some time ago (rev. 1.26
of coda/coda_vnops.c), perhaps the change triggered some
problem in coda?
I'd probably also check that the lookup() call in coda_symlink()
succeeds, and that nd.ni_vp is indeed NULL in  that case, since
that appears to be what the code assumes.

Jaromir

Greg Troxel wrote:
> I found that the double-vput problem in vfs_lookup was due to a vnode
> with type V_BAD.  This is passed to vfs_lookup from coda_symlink.
> Most of the time, the coda_call to symlink in coda_symlink works, and
> occasionally the call returns without error but the vnode is marked
> VBAD.
> 
> I checked for VBAD, and returned -1, but promptly got a panic in
> nfs_symlink, I think because an mbuf that was free()'d was trashed or
> just a bad pointer.
> 
> So, I'm guessing that the coda kernel code occasionally messes up, or
> there is some locking problem where the vnode gets modified/marked bad
> by something else.  This is all on a 192 MB i386 running
> cfsd/rpcbind/mountd, venus, bash, emacs, sshd/ntpd/etc.  and 3 more
> gettys.  There is basically nothing else going on, and the machine was
> freshly booted.
> 
> I am just beginning to grasp the locking rules, and I'd appreciate
> being set straight if I am confused (and thanks to those who already
> responeded):
> 
>   the interlock in the vnode protects the vnode ref counts and a few
>   other fields in the struct vnode.  It is held for short periods only
>   and is not about locking the vnode itself.
> 
>   Having a reference, expressed via the ref count field, protects you
>   against the vnode going away or turning into something completely
>   different.  But it does not guarantee anything about operations on
>   the vnode; to serialize those, the vn_lock is used.
> 
>   struct lock v_lock in the vnode protects the vnode in the larger
>   context in terms of fs operations.
> 
>   When the comments say 'the locked vnode', they always mean the
>   struct lock in the vnode (or rather v->v_vnlock, which in the coda
>   case always points to v->v_lock since there is no stackable fs stuff
>   going on).
> 
>   Little mention is made of the interlock in terms of locking
>   discussions, other than in vnode(9), because that's too obvious.
> 
>   vput, for example, expects that the interlock is not held.  It
>   unlocks *v->vn_lock, and then decrements usecount.  To do the
>   latter, it has to acquire the interlock, but that's not mentioned.
> 
>   One should in general not hold the interlock when calling VOP_LOCK
>   and VOP_UNLOCK or other vnops.  But some operations take the
>   LK_INTERLOCK flag to indicate that the interlock is already held.
> 
> So, is it reasonable for an unlocked vnode to change to VBAD?
> 
> Does holding the vn_lock mean that vgone should not be called?
> 
> Is there any place else I should suspect that is changing the type to
> VBAD?
> 
>         Greg Troxel <gdt@ir.bbn.com>
> 


-- 
Jaromir Dolecek <jdolecek@NetBSD.org>            http://www.NetBSD.org/
-=- We should be mindful of the potential goal, but as the tantric    -=-
-=- Buddhist masters say, ``You may notice during meditation that you -=-
-=- sometimes levitate or glow.   Do not let this distract you.''     -=-