NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
The following reply was made to PR kern/50375; it has been noted by GNATS.
From: Konrad Schroder <perseant%hhhh.org@localhost>
To: gnats-bugs%NetBSD.org@localhost, kern-bug-people%netbsd.org@localhost,
gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, riz%NetBSD.org@localhost
Cc:
Subject: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
Date: Thu, 29 Oct 2015 17:19:08 -0700
Forgive me for a possibly impertinent question, but what is the output
of "mount" on this system?
A quick read of hannken's analysis makes me think that the basic problem
here is that the two imposed locking orders (parent directory before
subdirectory, and upper before lower) are in direct conflict when a
directory is null-mounted onto a subdirectory of itself. In this case,
based on which vnodes are "..", it seems to me that we have something like
/dev/foo on /e0 type ffs
/e0/f8 on /e0/f8/20 type null
/e0/f8 on /somewhere/50 type null
where f8 has vnode 0x92314df8, 20 has vnode 0x9436ef20 and 50 has vnode
0x9246d850; and e0 has vnode 0x92b811e0. There might be more directory
layers between e0 and f8, and between f8 and 20.
If that does match the structure of the mount points, there could be a
very similar deadlock involving only one null mount: someone holds
/e0/f8 and tries to lock /e0/f8/20 as a subdirectory; someone else holds
/e0/f8/20, and tries to lock /e0/f8/20/f8 as a subdirectory---but that
is over /e0/f8, deadlock.
Thanks,
Konrad Schroder
perseant%hhhh.org@localhost
On 10/29/15 7:45 AM, J. Hannken-Illjes wrote:
> The following reply was made to PR kern/50375; it has been noted by GNATS.
>
> From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
> To: gnats-bugs%NetBSD.org@localhost
> Cc: Jeff Rizzo <riz%tastylime.net@localhost>
> Subject: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
> Date: Thu, 29 Oct 2015 15:40:24 +0100
>
> First analysis is:
>
> Thread 0x91596840 (0.9 vdrain) tries to clean vnode 0x9436ef20.
>
> Vnode 0x9436ef20 is VT_NULL, VDIR with lower vnode 0x92314df8.
> Lower vnode is VT_UFS, VDIR currently held by thread 0x95768060 (25124.1 =
> make).
>
> Thread 0x95768060 (25124.1 make) holds vnode 0x9246d850.
>
> Vnode 0x9246d850 is VT_NULL, VDIR with lower vnode 0x92314df8.
> Lower vnode is VT_UFS, VDIR.
>
> Thread 0x95768060 (25124.1 make) tries to lock vnode 0x948159a0.
>
> Vnode 0x948159a0 is VT_NULL, VDIR with lower vnode 0x94c176e0.
> Lower vnode is VT_UFS, VDIR currently held by thread 0x95863c00.
>
> Thread 0x95863c00 tries to vget 0x9436ef20.
>
> Deadlock.
>
>
> Thread 0x95768060 (25124.1 make) tries to lock here:
>
> if (searchdir !=3D foundobj) {
> if (cnp->cn_flags & ISDOTDOT)
> VOP_UNLOCK(searchdir);
> error =3D vn_lock(foundobj, LK_EXCLUSIVE);
> if (cnp->cn_flags & ISDOTDOT)
> =3D=3D=3D> vn_lock(searchdir, LK_EXCLUSIVE | =
> LK_RETRY);
> if (error !=3D 0) {
> vrele(foundobj);
> goto done;
> }
> }
>
> Thread 0x95863c00 calls VOP_LOOKUP() with locked vnode 0x92b811e0 here:
>
> cn.cn_nameiop =3D LOOKUP;
> cn.cn_flags =3D ISLASTCN | ISDOTDOT | RDONLY;
> cn.cn_cred =3D cred;
> cn.cn_nameptr =3D "..";
> cn.cn_namelen =3D 2;
> cn.cn_consume =3D 0;
>
> /* At this point, lvp is locked */
> =3D=3D=3D> error =3D VOP_LOOKUP(lvp, uvpp, &cn);
> vput(lvp);
>
>
> So we have two layerfs vnodes with the same lower vnode:
> 1) (upper 0x9436ef20 lower 0x92314df8)
> 2) (upper 0x9246d850 lower 0x92314df8).
>
> The first node gets cleaned from vdrain_thread -> cleanvnode -> vclean =
> and
> here vclean wants to lock it.
>
> The second node is the "foundobj" from thread 0x95768060 (25124.1 make),
> currently referenced and locked.
>
> --
> J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)
>
Home |
Main Index |
Thread Index |
Old Index