Subject: Re: FS layering bug?
To: Marton Fabo <email@example.com>
From: Bill Studenmund <firstname.lastname@example.org>
Date: 01/26/2004 10:14:44
Content-Type: text/plain; charset=us-ascii
On Mon, Jan 26, 2004 at 12:59:52AM +0100, Marton Fabo wrote:
> I have a 1.6ZC i386 system from some last october -current source, with=
> the default FS layout of / and /usr as two FSes.
> I have tried the following: I copied all the contents of the /=20
> filesystem to /var/tmp/chroot, then I null-mounted /usr on=20
> /var/tmp/chroot/usr read-only, and then I union-mounted an empty=20
> /var/tmp/chroot/chroot-usr directory over /var/tmp/chroot/usr. This is=20
> so that I have a full writeable replica of my system in /var/tmp/chroot,=
> with only having to duplicate the contents of the root FS.
> Now I chrooted to the above dir, started to play around, everything=20
> worked like a charm, I could modify it without endangering anything in=20
> my real system. But after some time, the kernel panicked with the=20
> message "locking against myself".
> The question is whether 1) this is an inherent, predictable crash in the=
> above sketched scenario, or a bug; 2) is it fixed in current -current;=20
> and 3) if the answer is "no" to both of the previous questions, what may=
> possibly be done to fix it.
Probalby a bug. I doubt it's been fixed lately. The most important thing=20
to get is the stack trace of the locking against myself panic.
Wait, where is the empty directory coming from?
Also, why not just use mount -t union -o -b /usr /var/chroot/usr ?
> PS.: Formerly I tried null-mounting / on /var/tmp/chroot also, using a=20
> modified /sbin/mount_null with the check for distinct directories=20
> disabled. It resulted in the same locking panic; but I accounted that=20
> crash to the assumption that mount_null had a valid reason to not allow=
> mounting non-distinct directories over each other by default. Now this=20
> also became an open question whether null-mounting /a/b over /a/b/c/d is=
> expected to cause a locikng error and kernel panic.
That null mounting (/a/b over /a/b/c/d) will lead to a kernel panic. =20
That's why the test is there. Directories are locked from root outward. So
you lock /, then /a, then /a/b, then /a/b/c, then /a/b/c/d. The problem
with that null mount is that /a/b and /a/b/c/d will end up having the same
lock. So when you lock /a/b, you also lock /a/b/c/d. Consider two
processes looking up the path name "/a/b/c/d". One of them (call it #1) =20
has gotten to /a/b/c and is looking up "d". It has /a/b/c locked at that
point. The other one (#2) comes along, and gets to /a/b looking for "c". =
It ends up waiting for #1 to release its lock on /a/b/c, and it has /a/b/=
locked in the mean time. Due to layering, it also has /a/b/c/d locked. #1=
now waits for the lock on "d", but it will never get released. The=20
kernel's now deadlocked.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)
-----END PGP SIGNATURE-----