Re: kern/50375: layerfs (nullfs) locking problem leading to livelock

To: hannken%NetBSD.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, riz%NetBSD.org@localhost
Subject: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
From: Konrad Schroder <perseant%hhhh.org@localhost>
Date: Fri, 30 Oct 2015 00:20:01 +0000 (UTC)

The following reply was made to PR kern/50375; it has been noted by GNATS.

From: Konrad Schroder <perseant%hhhh.org@localhost>
To: gnats-bugs%NetBSD.org@localhost, kern-bug-people%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, riz%NetBSD.org@localhost
Cc: 
Subject: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
Date: Thu, 29 Oct 2015 17:19:08 -0700

 Forgive me for a possibly impertinent question, but what is the output 
 of "mount" on this system?
 
 A quick read of hannken's analysis makes me think that the basic problem 
 here is that the two imposed locking orders (parent directory before 
 subdirectory, and upper before lower) are in direct conflict when a 
 directory is null-mounted onto a subdirectory of itself.  In this case, 
 based on which vnodes are "..", it seems to me that we have something like
 
    /dev/foo on /e0 type ffs
    /e0/f8 on /e0/f8/20 type null
    /e0/f8 on /somewhere/50 type null
 
 where f8 has vnode 0x92314df8, 20 has vnode 0x9436ef20 and 50 has vnode 
 0x9246d850; and e0 has vnode 0x92b811e0. There might be more directory 
 layers between e0 and f8, and between f8 and 20.
 
 If that does match the structure of the mount points, there could be a 
 very similar deadlock involving only one null mount: someone holds 
 /e0/f8 and tries to lock /e0/f8/20 as a subdirectory; someone else holds 
 /e0/f8/20, and tries to lock /e0/f8/20/f8 as a subdirectory---but that 
 is over /e0/f8, deadlock.
 
 Thanks,
 
          Konrad Schroder
          perseant%hhhh.org@localhost
 
 On 10/29/15 7:45 AM, J. Hannken-Illjes wrote:
 > The following reply was made to PR kern/50375; it has been noted by GNATS.
 >
 > From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
 > To: gnats-bugs%NetBSD.org@localhost
 > Cc: Jeff Rizzo <riz%tastylime.net@localhost>
 > Subject: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
 > Date: Thu, 29 Oct 2015 15:40:24 +0100
 >
 >   First analysis is:
 >   
 >   Thread 0x91596840 (0.9 vdrain) tries to clean vnode 0x9436ef20.
 >   
 >   Vnode 0x9436ef20 is VT_NULL, VDIR with lower vnode 0x92314df8.
 >   Lower vnode is VT_UFS, VDIR currently held by thread 0x95768060 (25124.1 =
 >   make).
 >   
 >   Thread 0x95768060 (25124.1 make) holds vnode 0x9246d850.
 >   
 >   Vnode 0x9246d850 is VT_NULL, VDIR with lower vnode 0x92314df8.
 >   Lower vnode is VT_UFS, VDIR.
 >   
 >   Thread 0x95768060 (25124.1 make) tries to lock vnode 0x948159a0.
 >   
 >   Vnode 0x948159a0 is VT_NULL, VDIR with lower vnode 0x94c176e0.
 >   Lower vnode is VT_UFS, VDIR currently held by thread 0x95863c00.
 >   
 >   Thread 0x95863c00 tries to vget 0x9436ef20.
 >   
 >   Deadlock.
 >   
 >   
 >   Thread 0x95768060 (25124.1 make) tries to lock here:
 >   
 >           if (searchdir !=3D foundobj) {
 >                   if (cnp->cn_flags & ISDOTDOT)
 >                           VOP_UNLOCK(searchdir);
 >                   error =3D vn_lock(foundobj, LK_EXCLUSIVE);
 >                   if (cnp->cn_flags & ISDOTDOT)
 >   =3D=3D=3D>                    vn_lock(searchdir, LK_EXCLUSIVE | =
 >   LK_RETRY);
 >                   if (error !=3D 0) {
 >                           vrele(foundobj);
 >                           goto done;
 >                   }
 >           }
 >   
 >   Thread 0x95863c00 calls VOP_LOOKUP() with locked vnode 0x92b811e0 here:
 >   
 >           cn.cn_nameiop =3D LOOKUP;
 >           cn.cn_flags =3D ISLASTCN | ISDOTDOT | RDONLY;
 >           cn.cn_cred =3D cred;
 >           cn.cn_nameptr =3D "..";
 >           cn.cn_namelen =3D 2;
 >           cn.cn_consume =3D 0;
 >   
 >           /* At this point, lvp is locked  */
 >   =3D=3D=3D>    error =3D VOP_LOOKUP(lvp, uvpp, &cn);
 >           vput(lvp);
 >   
 >   
 >   So we have two layerfs vnodes with the same lower vnode:
 >   1) (upper 0x9436ef20 lower 0x92314df8)
 >   2) (upper 0x9246d850 lower 0x92314df8).
 >   
 >   The first node gets cleaned from vdrain_thread -> cleanvnode -> vclean =
 >   and
 >   here vclean wants to lock it.
 >   
 >   The second node is the "foundobj" from thread 0x95768060 (25124.1 make),
 >   currently referenced and locked.
 >   
 >   --
 >   J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)
 >

Prev by Date: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
Next by Date: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
Previous by Thread: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
Next by Thread: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
Indexes:

Home | Main Index | Thread Index | Old Index