Subject: Re: reboot problems unmounting root
To: Antti Kantee <pooka@cs.hut.fi>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/05/2007 10:43:56
--1yeeQ81UyVL57Vl7
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 05, 2007 at 08:09:12PM +0300, Antti Kantee wrote:
> [this is probably better suited for tech-kern]
>=20
> On Thu Jul 05 2007 at 09:49:35 -0700, Bill Stouder-Studenmund wrote:
> >=20
> > Why is the lock exploding? I think that's the real problem. As long as =
the=20
> > revoked vnode still has references, it needs to have a working lock.
>=20
> The problem is that layer_bypass revokes the lower vnode and it gets
> recycled.  The lower vnode now has a reference, but is generally a
> deadfs vnode.  However, the upper layer isn't revoked, or neither does
> it think it is revoked/reclaimed.  When it tries to use the now-nuked
> lower layer's exported lock, boooom like that.

Being a deadfs node is fine. Making users explode isn't.

> > Blowing away the upper node is not necessary, and it doesn't lead to=20
> > correct functioning. The issue is that you can have layer stacks that a=
re=20
> > more complicated than just one layer above a leaf file system. You can=
=20
> > have more than one layer node above the same leaf node. As such, the no=
de=20
> > underneath you can ALWAYS get revoked out from under you. Given that, w=
e=20
> > have to be able to handle the lower node getting revoked, and once we d=
o=20
> > that, we don't need to zap the layer node the revoke goes through.
>=20
> I see, you're worried about the hamburger effect: the beef is revoked
> but the upper bun does not see it.
>=20
> Here's actually another way to repeat the same problem (which is not
> fixed by the proposed patch):
>=20
> touch /upper/foo
> sleep 10 < /upper/foo &
> revoke /lower/foo
> *wait for an earth-shattering kaboom*

It's actually worse, from what I gather of the discussion.

We have VOP_REVOKE() so that when someone logs out of a tty, any existing=
=20
process w/ that device (that instance of the vnode) open gets connected to=
=20
a black hole. That way your cat > special_secrets & process can't read my=
=20
passwords, and I can't accidentally see the output spew of the employee=20
performance analysis tool you have running the "layoffs" criteria set.

So VOP_REVOKE() is supposed to blow away the node in that it's now a=20
deadfs vnode. However the vnode is still supposed to be usable, to the=20
extent that all the callers get errors then eventually release references,=
=20
and the vnode dies.

=46rom what we're saying, that won't work. That's bad.

> > So change how revoke happens. Rip the inode off of the vnode, but don't=
=20
> > kill the lock.
>=20
> How do I not kill the lock?  The vnode is reclaimed.  It won't be
> re-reclaimed after this.  If it's the lower layer, it doesn't even know
> about the upper one, right?  If vnode locks were separate and had separate
> reference counts, then maybe, but ...

I think we need to look at our reclaim processing.

The lower layer DOES know something's going on. It knows the vnode has a=20
use count that's not zero.

So there are two cases where we go through reclaim processing. In one, we=
=20
have a vnode that's on the "free" list, meaning it's got no active=20
references. As an aside, it's not really free as it's still a live vnode=20
for whatever it initially was, it's just up for grabs. In this case, we=20
need to scrub the old usage of the vnode away, and make it ready for a new=
=20
user. Implicitly we know that it was at the head of the free list (well,=20
the first one up for grabs).

The other case is where we're revoking a node. For a leaf file system, if
the use count is greater than one, there are other users and so we can't
just wipe the vnode now. Unhook it from the fs, yes. Turn it into a deafs
node, yes. Make it so future uses will explode, no. Likewise, if the
revoke came to us via a layered file system, we shouldn't do the=20
super-destructive reclaim.

I haven't looked at the code in a while, but assuming the code will=20
correctly handle the "someone still has the tty open" case, we just need=20
to get the layer case to trigger the same processing. Adding an explicit=20
parameter to VOP_REVOKE() which sys_revoke() sets to 0 and layer_revoke=20
sets to 1 would do it.

If we won't correctly handle the case of a device being open when someone=
=20
else doe the revoke, well, we need to fix that issue first. :-)

Take care,

Bill

--1yeeQ81UyVL57Vl7
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFGjS3cWz+3JHUci9cRAoBiAJ9epdUAHttxE7nrFugjs9nNQuu1EwCdFlCC
NLcKjx03umKvxFe9Xsf7bhw=
=bVoO
-----END PGP SIGNATURE-----

--1yeeQ81UyVL57Vl7--