tech-kern: Re: FS layering bug?

Subject: Re: FS layering bug?
To: Marton Fabo <morton@eik.bme.hu>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 01/28/2004 16:42:30
--kfjH4zxOES6UT95V
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jan 26, 2004 at 08:11:01PM +0100, Marton Fabo wrote:
> Bill Studenmund wrote:
>=20
> >Probalby a bug. I doubt it's been fixed lately. The most important thing=
=20
> >to get is the stack trace of the locking against myself panic.
>=20
> How do I do it? I'm not really into kernel debugging. A ponter will be=20
> enough... Anyway, will I have to write the stack trace down on paper, or=
=20
> is there a way to save the ddb (or however the kernel debugger is=20
> called) session's log?

I think one of the other threads mentioned t/l will give you a back trace=
=20
in the dmesg buffer.

> >Wait, where is the empty directory coming from?
>=20
> As I already have written, the unioned empty directory was=20
> /var/tmp/chroot/chroot-usr, so it was a regular directory in the=20
> chrooted environment's root dir.

That's fine. I'd missed that detail. Most importantly you weren't cerating=
=20
a loop.

> >Also, why not just use mount -t union -o -b /usr /var/chroot/usr ?
>=20
> Yes, now that I checked mount_union, that would have been an option. In=
=20
> fact, I'll check if that is good for me. I just overlooked it because my=
=20
> initial intent was to "replicate the filesystem read-only, and then=20
> overlay an empty writable layer"...
>=20
> Anyway, it's still interesting why the former solution confused and=20
> crashed the kernel.

Yes it is.

[snip]

> >That null mounting (/a/b over /a/b/c/d) will lead to a kernel panic. =20
> >That's why the test is there. Directories are locked from root outward. =
So
> >you lock /, then /a, then /a/b, then /a/b/c, then /a/b/c/d. The problem
> >with that null mount is that /a/b and /a/b/c/d will end up having the sa=
me
> >lock. So when you lock /a/b, you also lock /a/b/c/d. Consider two
> >processes looking up the path name "/a/b/c/d". One of them (call it #1) =
=20
> >has gotten to /a/b/c and is looking up "d". It has /a/b/c locked at that
> >point. The other one (#2) comes along, and gets to /a/b looking for "c".=
 =20
> >It ends up waiting for #1 to release its lock on /a/b/c, and it has /a/b=
/=20
> >locked in the mean time. Due to layering, it also has /a/b/c/d locked. #=
1=20
> >now waits for the lock on "d", but it will never get released. The=20
> >kernel's now deadlocked.
>=20
> That sounds quite logical. Can't this be fixed by some clever mechanism=
=20
> however, other than simply disallowing the scenario?

Not easily. Deadlock detection and prevention is not easy, and it ends up=
=20
it's much easier to just not permit the scenario.

> <15 minutes pass>
>=20
> Actually, it looks like it really should be addressed somehow. Look at=20
> the following session transcript:

[snip]

> >/dev/wd0a on / type ffs (local)
> >/dev/wd0e on /usr type ffs (NFS exported, local)
> >/usr/home/morton/tmp on /usr/home/morton/tmp1/tmp2 type null (local)
> >[morton@gerzson:/usr/home/morton | 01/26 19:50:05]
> ># touch tmp/a
> >[morton@gerzson:/usr/home/morton | 01/26 19:50:14]
> ># ll tmp
> >total 0
> >-rw-r--r--  1 root  users  0 Jan 26 19:50 a
> >[morton@gerzson:/usr/home/morton | 01/26 19:50:16]
> ># ll tmp3/tmp2/
> >total 0
> >-rw-r--r--  1 root  users  0 Jan 26 19:50 a
> >[morton@gerzson:/usr/home/morton | 01/26 19:50:24]
> ># umount tmp3/tmp2
> >umount: /usr/home/morton/tmp3/tmp2: not currently mounted
> >[morton@gerzson:/usr/home/morton | 01/26 19:50:32]
> ># umount tmp
> >umount: /usr/home/morton/tmp: not currently mounted
> >[morton@gerzson:/usr/home/morton | 01/26 19:50:40]
> >#
>=20
> Here I basically null-mounted a directory over another (distinct, in=20
> this case) subdirectory, and then renamed a parent of the mount point.=20
> The mount remained active, but I had no way to unmount it anymore. (To=20
> be exact, after re-renaming the parent to the original name, I could,=20
> but this still seems to be quite risky.)
>=20
> So, I guess the full path to all mount points should be prevented from=20
> any modification (at least renaming). This alone still wouldn't solve my=
=20
> problem.

Why? The only account that should be able to do this renaming is root (in=
=20
the common case), so you're looking at a case where the administrator shot=
=20
him or herself in the foot. Yes, we _could_ do something, but it doesn't=20
really seem worth it. Also, we don't at present have a mechanism for=20
preventing the rename, so we'd need to add that too.

> But to allow access despite preventing modification, processes should be
> able to lock for reading, and of course any number of *readers* should
> be able to access the directories concurrently. And this would solve my=
=20
> problem too, since the concurrent read-only accessor processes wouldn't=
=20
> block each other, thus not cause a deadlock.

But that won't really help. It will make the issue less likely, but the=20
problem still remains. There are a number of operations that will need=20
exclusive locks ("writer" locks), and they still face the deadlock issue=20
above.

Take care,

Bill

--kfjH4zxOES6UT95V
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFAGFb2Wz+3JHUci9cRAvQgAJ4vU9TS9Od4vjtJ4ggutg3fJgIT2wCfYxkc
5cxDeKA6+yd32OxUqKi3i1o=
=MgBT
-----END PGP SIGNATURE-----

--kfjH4zxOES6UT95V--