Subject: Re: vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup()
To: Jaromir Dolecek <jdolecek@netbsd.org>
From: Jaromir Dolecek <jdolecek@netbsd.org>
List: tech-kern
Date: 03/16/2003 21:52:45
Jaromir Dolecek wrote:
> Locking rules for symlink vnode op have changed some time ago (rev. 1.26
> of coda/coda_vnops.c), perhaps the change triggered some
> problem in coda?
> I'd probably also check that the lookup() call in coda_symlink()
> succeeds, and that nd.ni_vp is indeed NULL in  that case, since

Err, 'is NULL if lookup() fails' was what I meant.

Jaromir

> that appears to be what the code assumes.
> 
> Jaromir
> 
> Greg Troxel wrote:
> > I found that the double-vput problem in vfs_lookup was due to a vnode
> > with type V_BAD.  This is passed to vfs_lookup from coda_symlink.
> > Most of the time, the coda_call to symlink in coda_symlink works, and
> > occasionally the call returns without error but the vnode is marked
> > VBAD.
> > 
> > I checked for VBAD, and returned -1, but promptly got a panic in
> > nfs_symlink, I think because an mbuf that was free()'d was trashed or
> > just a bad pointer.
> > 
> > So, I'm guessing that the coda kernel code occasionally messes up, or
> > there is some locking problem where the vnode gets modified/marked bad
> > by something else.  This is all on a 192 MB i386 running
> > cfsd/rpcbind/mountd, venus, bash, emacs, sshd/ntpd/etc.  and 3 more
> > gettys.  There is basically nothing else going on, and the machine was
> > freshly booted.
> > 
> > I am just beginning to grasp the locking rules, and I'd appreciate
> > being set straight if I am confused (and thanks to those who already
> > responeded):
> > 
> >   the interlock in the vnode protects the vnode ref counts and a few
> >   other fields in the struct vnode.  It is held for short periods only
> >   and is not about locking the vnode itself.
> > 
> >   Having a reference, expressed via the ref count field, protects you
> >   against the vnode going away or turning into something completely
> >   different.  But it does not guarantee anything about operations on
> >   the vnode; to serialize those, the vn_lock is used.
> > 
> >   struct lock v_lock in the vnode protects the vnode in the larger
> >   context in terms of fs operations.
> > 
> >   When the comments say 'the locked vnode', they always mean the
> >   struct lock in the vnode (or rather v->v_vnlock, which in the coda
> >   case always points to v->v_lock since there is no stackable fs stuff
> >   going on).
> > 
> >   Little mention is made of the interlock in terms of locking
> >   discussions, other than in vnode(9), because that's too obvious.
> > 
> >   vput, for example, expects that the interlock is not held.  It
> >   unlocks *v->vn_lock, and then decrements usecount.  To do the
> >   latter, it has to acquire the interlock, but that's not mentioned.
> > 
> >   One should in general not hold the interlock when calling VOP_LOCK
> >   and VOP_UNLOCK or other vnops.  But some operations take the
> >   LK_INTERLOCK flag to indicate that the interlock is already held.
> > 
> > So, is it reasonable for an unlocked vnode to change to VBAD?
> > 
> > Does holding the vn_lock mean that vgone should not be called?
> > 
> > Is there any place else I should suspect that is changing the type to
> > VBAD?
> > 
> >         Greg Troxel <gdt@ir.bbn.com>
> > 
> 
> 
> -- 
> Jaromir Dolecek <jdolecek@NetBSD.org>            http://www.NetBSD.org/
> -=- We should be mindful of the potential goal, but as the tantric    -=-
> -=- Buddhist masters say, ``You may notice during meditation that you -=-
> -=- sometimes levitate or glow.   Do not let this distract you.''     -=-
> 


-- 
Jaromir Dolecek <jdolecek@NetBSD.org>            http://www.NetBSD.org/
-=- We should be mindful of the potential goal, but as the tantric    -=-
-=- Buddhist masters say, ``You may notice during meditation that you -=-
-=- sometimes levitate or glow.   Do not let this distract you.''     -=-