Subject: Re: FS layering bug?
To: Marton Fabo <morton@eik.bme.hu>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 01/26/2004 10:14:44
--RpqchZ26BWispMcB
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jan 26, 2004 at 12:59:52AM +0100, Marton Fabo wrote:
>=20
> Hello!
>=20
> I have a 1.6ZC i386 system from some last october -current source, with=
=20
> the default FS layout of / and /usr as two FSes.
>=20
> I have tried the following: I copied all the contents of the /=20
> filesystem to /var/tmp/chroot, then I null-mounted /usr on=20
> /var/tmp/chroot/usr read-only, and then I union-mounted an empty=20
> /var/tmp/chroot/chroot-usr directory over /var/tmp/chroot/usr. This is=20
> so that I have a full writeable replica of my system in /var/tmp/chroot,=
=20
> with only having to duplicate the contents of the root FS.
>=20
> Now I chrooted to the above dir, started to play around, everything=20
> worked like a charm, I could modify it without endangering anything in=20
> my real system. But after some time, the kernel panicked with the=20
> message "locking against myself".

Oops!

> The question is whether 1) this is an inherent, predictable crash in the=
=20
> above sketched scenario, or a bug; 2) is it fixed in current -current;=20
> and 3) if the answer is "no" to both of the previous questions, what may=
=20
> possibly be done to fix it.

Probalby a bug. I doubt it's been fixed lately. The most important thing=20
to get is the stack trace of the locking against myself panic.

Wait, where is the empty directory coming from?

Also, why not just use mount -t union -o -b /usr /var/chroot/usr ?

> thx
> mortee
>=20
> PS.: Formerly I tried null-mounting / on /var/tmp/chroot also, using a=20
> modified /sbin/mount_null with the check for distinct directories=20
> disabled. It resulted in the same locking panic; but I accounted that=20
> crash to the assumption that mount_null had a valid reason to not allow=
=20
> mounting non-distinct directories over each other by default. Now this=20
> also became an open question whether null-mounting /a/b over /a/b/c/d is=
=20
> expected to cause a locikng error and kernel panic.

That null mounting (/a/b over /a/b/c/d) will lead to a kernel panic. =20
That's why the test is there. Directories are locked from root outward. So
you lock /, then /a, then /a/b, then /a/b/c, then /a/b/c/d. The problem
with that null mount is that /a/b and /a/b/c/d will end up having the same
lock. So when you lock /a/b, you also lock /a/b/c/d. Consider two
processes looking up the path name "/a/b/c/d". One of them (call it #1) =20
has gotten to /a/b/c and is looking up "d". It has /a/b/c locked at that
point. The other one (#2) comes along, and gets to /a/b looking for "c". =
=20
It ends up waiting for #1 to release its lock on /a/b/c, and it has /a/b/=
=20
locked in the mean time. Due to layering, it also has /a/b/c/d locked. #1=
=20
now waits for the lock on "d", but it will never get released. The=20
kernel's now deadlocked.

Take care,

Bill

--RpqchZ26BWispMcB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFAFVkUWz+3JHUci9cRAi4rAJ9tdEdqHxBKoOXhgjKSdqnS8a1edQCglCJ0
O2pDJhDyfjM2F3N5+w1gKlo=
=puYv
-----END PGP SIGNATURE-----

--RpqchZ26BWispMcB--