NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/50375: layerfs (nullfs) locking problem leading to livelock



Forgive me for a possibly impertinent question, but what is the output of "mount" on this system?

A quick read of hannken's analysis makes me think that the basic problem here is that the two imposed locking orders (parent directory before subdirectory, and upper before lower) are in direct conflict when a directory is null-mounted onto a subdirectory of itself. In this case, based on which vnodes are "..", it seems to me that we have something like

  /dev/foo on /e0 type ffs
  /e0/f8 on /e0/f8/20 type null
  /e0/f8 on /somewhere/50 type null

where f8 has vnode 0x92314df8, 20 has vnode 0x9436ef20 and 50 has vnode 0x9246d850; and e0 has vnode 0x92b811e0. There might be more directory layers between e0 and f8, and between f8 and 20.

If that does match the structure of the mount points, there could be a very similar deadlock involving only one null mount: someone holds /e0/f8 and tries to lock /e0/f8/20 as a subdirectory; someone else holds /e0/f8/20, and tries to lock /e0/f8/20/f8 as a subdirectory---but that is over /e0/f8, deadlock.

Thanks,

        Konrad Schroder
        perseant%hhhh.org@localhost

On 10/29/15 7:45 AM, J. Hannken-Illjes wrote:
The following reply was made to PR kern/50375; it has been noted by GNATS.

From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: Jeff Rizzo <riz%tastylime.net@localhost>
Subject: Re: kern/50375: layerfs (nullfs) locking problem leading to livelock
Date: Thu, 29 Oct 2015 15:40:24 +0100

  First analysis is:
Thread 0x91596840 (0.9 vdrain) tries to clean vnode 0x9436ef20. Vnode 0x9436ef20 is VT_NULL, VDIR with lower vnode 0x92314df8.
  Lower vnode is VT_UFS, VDIR currently held by thread 0x95768060 (25124.1 =
  make).
Thread 0x95768060 (25124.1 make) holds vnode 0x9246d850. Vnode 0x9246d850 is VT_NULL, VDIR with lower vnode 0x92314df8.
  Lower vnode is VT_UFS, VDIR.
Thread 0x95768060 (25124.1 make) tries to lock vnode 0x948159a0. Vnode 0x948159a0 is VT_NULL, VDIR with lower vnode 0x94c176e0.
  Lower vnode is VT_UFS, VDIR currently held by thread 0x95863c00.
Thread 0x95863c00 tries to vget 0x9436ef20. Deadlock. Thread 0x95768060 (25124.1 make) tries to lock here: if (searchdir !=3D foundobj) {
                  if (cnp->cn_flags & ISDOTDOT)
                          VOP_UNLOCK(searchdir);
                  error =3D vn_lock(foundobj, LK_EXCLUSIVE);
                  if (cnp->cn_flags & ISDOTDOT)
  =3D=3D=3D>                    vn_lock(searchdir, LK_EXCLUSIVE | =
  LK_RETRY);
                  if (error !=3D 0) {
                          vrele(foundobj);
                          goto done;
                  }
          }
Thread 0x95863c00 calls VOP_LOOKUP() with locked vnode 0x92b811e0 here: cn.cn_nameiop =3D LOOKUP;
          cn.cn_flags =3D ISLASTCN | ISDOTDOT | RDONLY;
          cn.cn_cred =3D cred;
          cn.cn_nameptr =3D "..";
          cn.cn_namelen =3D 2;
          cn.cn_consume =3D 0;
/* At this point, lvp is locked */
  =3D=3D=3D>    error =3D VOP_LOOKUP(lvp, uvpp, &cn);
          vput(lvp);
So we have two layerfs vnodes with the same lower vnode:
  1) (upper 0x9436ef20 lower 0x92314df8)
  2) (upper 0x9246d850 lower 0x92314df8).
The first node gets cleaned from vdrain_thread -> cleanvnode -> vclean =
  and
  here vclean wants to lock it.
The second node is the "foundobj" from thread 0x95768060 (25124.1 make),
  currently referenced and locked.
--
  J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)



Home | Main Index | Thread Index | Old Index