Subject: Re: reboot problems unmounting root
To: Antti Kantee , Juan RP <juan@xtrarom.org>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-userlevel
Date: 07/05/2007 09:49:35
--ZPt4rx8FFjLCG7dd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 05, 2007 at 01:37:23AM +0300, Antti Kantee wrote:
> On Wed Jul 04 2007 at 20:43:30 +0200, Juan RP wrote:
> > On Wed, 4 Jul 2007 19:34:07 +0200
> > "Zafer Aydogan" <zafer@aydogan.de> wrote:
> >=20
> > > I did. Now I can't boot. System panics right before going into multiu=
ser.
> > > Screenshots are available at http://aydogan.org/reboot/
> >=20
> > Build it with LOCKDEBUG, take pictures of the panic message/backtrace a=
nd
> > send a PR.
> >=20
> > There seems to be a locking error in layerfs or something.
>=20
> The problem is that nullfs passes the VOP_REVOKE operation to the
> lower vnode.  However, the upper nullfs vnode remains entirely intact.
> Then when vrele() is called from sys_revoke(), the upper layer vnode tries
> to use the lock of the now-revoked lower layer vnode and goes kabloom.
> I think the correct fix is to supply a revoke operation for nullfs &
> layerfs, but I'm not intimate enough with them to be entirely sure that's
> the correct fix.  At least the problem goes away using the attached patch.

The problems you're running into are why we don't really have revoke=20
processing in layerfs.

Why is the lock exploding? I think that's the real problem. As long as the=
=20
revoked vnode still has references, it needs to have a working lock.

Blowing away the upper node is not necessary, and it doesn't lead to=20
correct functioning. The issue is that you can have layer stacks that are=
=20
more complicated than just one layer above a leaf file system. You can=20
have more than one layer node above the same leaf node. As such, the node=
=20
underneath you can ALWAYS get revoked out from under you. Given that, we=20
have to be able to handle the lower node getting revoked, and once we do=20
that, we don't need to zap the layer node the revoke goes through.

So change how revoke happens. Rip the inode off of the vnode, but don't=20
kill the lock.

Take care,

Bill


--ZPt4rx8FFjLCG7dd
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFGjSEfWz+3JHUci9cRAt2mAJ9M4uCQMb8Kn0Z1M4IUwV6u/iGJuwCfTVo7
cRShIC40oYgkgt8u0dg6/hU=
=5d3L
-----END PGP SIGNATURE-----

--ZPt4rx8FFjLCG7dd--