Subject: Re: locking bug in coda_lookup?
To: Greg Troxel <gdt@ir.bbn.com>
From: Jaromir Dolecek <jdolecek@NetBSD.org>
List: tech-kern
Date: 09/05/2004 12:00:51
Greg Troxel wrote:
> I got a panic in coda_lookup:
>
> unlocked parent but couldn't lock child
>
> while doing
>
> find . -type f -print0 | xargs -0 cat > /dev/null
>
> in coda. I'm not certain which of two very similarfragments the code
> was in (don't have netbsd.gdb any more), but it was very much like
> this:
>
> if (*ap->a_vpp) {
> if ((error = vn_lock(*ap->a_vpp, LK_EXCLUSIVE))) {
> printf("coda_lookup: ");
> panic("unlocked parent but couldn't lock child");
> }
> }
>
> It seems that this will fail if someone else has the vnode locked, and
> there will be no retry. From reading vn_lock, it seems that if one
> passes LK_NOWAIT, that on encountering a locked vnode, vn_lock returns
> immediately. If LK_NOWAIT is not set, and LK_RETRY is also not set,
> it seems that vn_lock will tsleep on the vnode's v_interlock, and then
> return ENOENT instead of retrying. The ufs code uses LK_RETRY in what
> I think is the analogous case.
>
> I don't understand why it makes sense to sleep and not retry the lock
> - if one isn't going to retry, what's the point of sleeping?
>
> So, I think that coda_lookup should pass LK_RETRY. But I either don't
> quite or just barely understand vnode locking, so I'd appreciate
> advice here.
Your analysis might be right. Does it work fine if you try
changing the source to use LK_EXCLUSIVE|LK_RETRY in
the vn_lock() call?
> Also, it seems that the coda lookup operation doesn't properly handle
> IS_DOTDOT where the locking rules are different. That could be
> related to my crash instead.
Right. If that is the case, the find will deadlock instead :)
Jaromir
--
Jaromir Dolecek <jdolecek@NetBSD.org> http://www.NetBSD.cz/
-=- We should be mindful of the potential goal, but as the Buddhist -=-
-=- masters say, ``You may notice during meditation that you -=-
-=- sometimes levitate or glow. Do not let this distract you.'' -=-