The following reply was made to PR kern/50375; it has been noted by GNATS.
From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: Jeff Rizzo <riz%tastylime.net@localhost>
Subject: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
Date: Thu, 29 Oct 2015 15:40:24 +0100
First analysis is:
Thread 0x91596840 (0.9 vdrain) tries to clean vnode 0x9436ef20.
Vnode 0x9436ef20 is VT_NULL, VDIR with lower vnode 0x92314df8.
Lower vnode is VT_UFS, VDIR currently held by thread 0x95768060 (25124.1 =
make).
Thread 0x95768060 (25124.1 make) holds vnode 0x9246d850.
Vnode 0x9246d850 is VT_NULL, VDIR with lower vnode 0x92314df8.
Lower vnode is VT_UFS, VDIR.
Thread 0x95768060 (25124.1 make) tries to lock vnode 0x948159a0.
Vnode 0x948159a0 is VT_NULL, VDIR with lower vnode 0x94c176e0.
Lower vnode is VT_UFS, VDIR currently held by thread 0x95863c00.
Thread 0x95863c00 tries to vget 0x9436ef20.
Deadlock.
Thread 0x95768060 (25124.1 make) tries to lock here:
if (searchdir !=3D foundobj) {
if (cnp->cn_flags & ISDOTDOT)
VOP_UNLOCK(searchdir);
error =3D vn_lock(foundobj, LK_EXCLUSIVE);
if (cnp->cn_flags & ISDOTDOT)
=3D=3D=3D> vn_lock(searchdir, LK_EXCLUSIVE | =
LK_RETRY);
if (error !=3D 0) {
vrele(foundobj);
goto done;
}
}
Thread 0x95863c00 calls VOP_LOOKUP() with locked vnode 0x92b811e0 here:
cn.cn_nameiop =3D LOOKUP;
cn.cn_flags =3D ISLASTCN | ISDOTDOT | RDONLY;
cn.cn_cred =3D cred;
cn.cn_nameptr =3D "..";
cn.cn_namelen =3D 2;
cn.cn_consume =3D 0;
/* At this point, lvp is locked */
=3D=3D=3D> error =3D VOP_LOOKUP(lvp, uvpp, &cn);
vput(lvp);
So we have two layerfs vnodes with the same lower vnode:
1) (upper 0x9436ef20 lower 0x92314df8)
2) (upper 0x9246d850 lower 0x92314df8).
The first node gets cleaned from vdrain_thread -> cleanvnode -> vclean =
and
here vclean wants to lock it.
The second node is the "foundobj" from thread 0x95768060 (25124.1 make),
currently referenced and locked.
--
J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)