Subject: kern/35143: vnode locking deadlock with layered file systems & vclean().
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <wrstuden@NetBSD.org>
List: netbsd-bugs
Date: 11/27/2006 18:50:00
>Number:         35143
>Category:       kern
>Synopsis:       vnode locking deadlock with layered file systems & vclean().
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Nov 27 18:50:00 +0000 2006
>Originator:     Bill Studenmund
>Release:        NetBSD 4.99.4
>Organization:
	
>Environment:
	
	
>Description:
Here's a description from Darrin (dbj@).

Andrew Doran and I did some debugger grovelling on nbftp while it was
getting processes stuck in vnlock on Saturday.  It appears to be a
layered filesystem lock issue possibly stemming from the recursive
vget shennanigans in layer_node_find.  This email contains some of the
specifics we gathered from the system:

Most processes were in wait state "vnlock" on address ffff8000522b98c8,
which is the address of a lockmanager lock, presumably pointed to from a vnode.

It is believed that this lock was held by an rsync process pid 8322,
which was stuck waiting for vget with the following backtrace:

    $ ps -ax -o pid,paddr,laddr,wchan,nwchan,comm |grep ffff80004b54fc58
    8322 ffff80005712ec90 ffff80004b54fc58 vget     ffff80004d096e10 rsyncd

    #0  0xffffffff80259aee in mi_switch ()
    #1  0xffffffff8025a3ed in ltsleep ()
    #2  0xffffffff802a1a4c in vget ()
    #3  0xffffffff802b3196 in layer_node_find ()
    #4  0xffffffff802b320d in layer_node_create ()
    #5  0xffffffff802b4482 in layer_lookup ()
    #6  0xffffffff802ae2fa in VOP_LOOKUP ()
    #7  0xffffffff8029dc06 in lookup ()
    #8  0xffffffff8029e232 in namei ()
    #9  0xffffffff802ac243 in vn_open ()
    #10 0xffffffff802a8e40 in sys_open ()
    #11 0xffffffff802e4e45 in syscall_plain ()

which was stuck in vget for vnode at the address ffff80004d096e10, it
is presumably the process that set the VXWANT flag.

    db> show vnode ffff80004d096e10
    OBJECT 0xffff80004d096e10: locked=0, pgops=0xffffffff804e9b00, npages=0, refs=0
    VNODE flags 2300<XLOCK,XWANT,LAYER>
    mp 0xffff80001cfeb000 numoutput 0 size 0x0
    data 0xffff80001eac0540 usecount 0 writecount 0 holdcnt 0 numoutput 0
    tag VT_NULL(9) type VDIR(2) mount 0xffff80001cfeb000 typedata 0x0

There was also a cvs process 8386 which was cleaning this vnode and
waiting for it to drain after presumably having set the VXLOCK flag.

    #1  0xffffffff8025a3ed in ltsleep ()
    #2  0xffffffff80242dd8 in acquire ()
    #3  0xffffffff80243f29 in _lockmgr ()
    #4  0xffffffff802ae8f8 in VOP_LOCK ()
    #5  0xffffffff802a2176 in vclean ()
    #6  0xffffffff802a2742 in vgonel ()
    #7  0xffffffff802a2bbc in getcleanvnode ()
    #8  0xffffffff802a2d09 in getnewvnode ()
    #9  0xffffffff802b32d1 in layer_node_alloc ()
    #10 0xffffffff802b3250 in layer_node_create ()
    #11 0xffffffff802b4482 in layer_lookup ()
    #12 0xffffffff802ae2fa in VOP_LOOKUP ()
    #13 0xffffffff8029dc06 in lookup ()
    #14 0xffffffff8029e232 in namei ()
    #15 0xffffffff802a6afc in sys___stat30 ()
    #16 0xffffffff802e4e45 in syscall_plain ()

I would have liked to get a slight bit more data on the specific
vnodes, although several of the processes i wanted to backtrace were
swapped out and it paniced shortly after disabling swap.

I was also surprised in looking at this data that vfs_vnode_print
doesn't call VOP_PRINT on the node, and I didn't get to call
vprint from ddb before the system paniced.

In any case, this hopefully can point layered fs locking experts in
the right direction.

Thanks,
Darrin

>How-To-Repeat:
Run a layered file system on a busy server and wait.
>Fix:
Change how vget() works for layer_node_find(). We need to change both
layer_node_find() and lockmgr().

The problem is that our recursive locking really isn't what we're wanting
to do.

What we really want is to perform all of the vget() processing on
the upper vnode (remove it from the free list, etc.) but to skip
all of the lockmgr() processing (since the vnode stack locks as a
whole and we already have it locked). We can't do that with the
current lockmgr() call, and recursive locking was an attempt to
get around it.

The issue seen here is where something decides to vclean() a layer
vnode at the same time as we have an operation wanting to pull
that vnode off of the free list. The vclean() operation has flagged
the vnode as in-cleanup (it has set VXLOCK), and is waiting for
the LK_DRAIN to complete. At the same time, an operation has
completed on the lower layer and is looking for the corresponding
upper vnode (the operation has to be a lookup to get into this case),
and is trying to vget() it. vget() has seen the VXLOCK and is sleeping
waiting for the clean to finish. Thus deadlock.

The main part of the fix is to come up with a way for vget() to perform
its manipulations w/o lockmgr() doing anything & to call vget() with
LK_NOWAIT. Then we have to also adjust layer_node_find() to deal with
this case & do the right thing in face of reallocating an upper vnode.

>Unformatted: