Subject: Re: vnode locking in coda_lookup
To: None <tech-kern@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-kern
Date: 04/05/2007 13:27:12
  > Thanks - I do follow that we need to have an order to avoid deadlocks,
  > and what our order is.  What's not 100% clear in my head is the exact
  > relationship between the interlock and the vnode lock, which has to be
  > held for each possible operation, and mostly what can happen after vref
  > but without holding the vnode lock.

  Interlock is fine. You don't take other locks while holding the interlock. 
  You at most trade the interlock on a vnode for the lock on said vnode. And 
  the loking code handles this correctly.

So is this right?

  interlock protects
    access to vnode lock
    reference counts

  vnode lock protects
    all other changes to vnode contents

So, when holding vnode locks, one can be assured of no changes, and do
multi-step operations which might temporarily violate invariants.

When holding a reference, one can be sure that the vnode will continue
to exist, but not that it won't be revoked.

  > Without LK_RETRY, and maybe even with, the caller has to check the
  > vn_lock return value and do something.  In vfs_lookup, if we fail to
  > lock the child, we can return an error.  But, since we have a vref at
  > this point, it seems this should never happen, so maybe it should be a
  > panic.

  It's unlikely at the moment. If you have a ref, the one thing that could 
  happen is a VOP_REVOKE(), but I don't think you can do that on a 
  directory.

  Oh, the one other thing that might happen some day is that we add code to 
  handle forcible file system unmount. In such a case, we would probably 
  want to revoke all the vnodes. So you could have a reference on something 
  then have it die on you.

OK, I will try printf of case (since odd enough to be interesting) and
error return and see if it happens.

Reading vfs_lookup, I see LK_RETRY all over the place, along with
failure to check vn_lock return values.  Not saying that argues for it
being correct in coda_lookup, but it seems there's a lot of LK_RETRY
trouble about.

  > In the case of relocking the parent, we also have a vref (or rather our
  > caller must), so can that ever reasonably fail?  Should I panic if
  > vn_lock (w/o LK_RETRY) fails?

  Don't panic. Return an error. This case (".." lookup) is why I added 
  PDIRUNLOCK. I am not sure if chuq removed it.

PDIRUNLOCK appears to have been removed in a commit
kere/vfs_lookup.c:1.73 by chs on 2006/12/09.

I don't understand how coda_lookup can handle not being able to relock
the parent (without panicing, which I realize isn't exactly "handle").

-- 
    Greg Troxel <gdt@ir.bbn.com>