Subject: Re: LK_SHARED for VFS_VGET/FHTOVP
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 05/25/2004 23:15:32
--dDRMvlgZJXvWKvBx
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, May 26, 2004 at 10:05:05AM +0900, YAMAMOTO Takashi wrote:
> hi,
>=20
> > While not the most concurrent, our VOP interface was pretty MP-safe. Si=
nce
> > all the locks were exclusive, you could have as many threads wandering
> > around in the vnode system, on as many CPUs as you wanted. So once all =
the
> > systems below the VOP layer (the buffer cache and the vm cache and the
> > disk drivers) were MP-safe, it would be reasonable to start letting some
> > file operations past the big-lock. Since the buffer cache and vm cache=
=20
> > have been MP-ified, we are actually close to that. The bottom-half of t=
he=20
> > kernel is the only remaining issue.
>=20
> i don't think that my proposed change affects mp-safeness much.

I agree it doesn't affect MP-readiness much. However it's a second step
down a path I really really think we should not have started down AT ALL.

Specifically with using exclusive locks, I think that the vnode system was=
=20
fine-grained. Perhaps not the best locking (we can get better parallelism,=
=20
perhaps), but correct. By changing things so we need big-lock, we take a=20
step back.

As an aside, I admit that the part we still need to really take advantage=
=20
of this is for the bottom-half of the kernel to be fine-grained, which is=
=20
a ways off.

> > However as we start using LK_SHARED, the only thing that is keeping the
> > VOP interface safe is our use of big-lock in the kernel. That's a step
> > backwards for SMP, in my opinion.
>=20
> can you please be more specific?
> i know atime-update in VOP_READ is protected only by biglock.
> (i don't think it's a major problem, though.)
> do you know examples other than it?

For now that's the only specific one. However I do think it's a major=20
problem.

I agree it wouldn't be a hard problem to fix, and in terms of code it's=20
probably minor.

My major objection (which I fully admit is to something you didn't do) is
that by making read(2) use shared locks, we started consciously violating
our own vnode interface. You're suggeting adding another path which will
use shared locks, with the same issues. So while this change won't
introduce a new problem, it will extend what I see as an existing one. :-)=
=20
Hmm... Maybe we should do what you propose to at least be consistent in=20
our error; I'm not sure.

Over the years the vnode interface has gotten a reputation for being
complicated and troublesome. When I was at NASA, I worked on cleaning lots
of it up, and among other things we got production-ready layered file
systems. Part of getting things to work was being an ass about locking,
and making sure we did locking right in all places. "Right" meant=20
documenting what should happen and auditing all the code to match. To go=20
back to a comment you made about me believing documentation, I actually=20
tried to make sure the code and documentation matched (what I think=20
"should" happen). :-)

By encouraging us to break what documentation we have, we open the door to
getting back into an untrusted, confusing mess. For instance, new=20
developers have to just "know" that it's "ok" to use shared locks for=20
reads, but not anything else.

So my objections are:

1) By using LK_SHARED locks for VOP_READ() and VOP_READDIR() we violate=20
our locking protocol. Not only for these calls, but all the calls they=20
make (like to VOP_UPDATE()). We shouldn't be doing it now, and I don't=20
think we should do it for more. Ok, this side of the objection is a bit=20
pedantic, but once we start, where do we stop? How do we guage the next=20
locking protocol botch?

2) We don't know how bad things are. Sure, ffs_read() (and lfs_read()) =20
only call ufs_update(), which is "ok" as long as we have big-lock. But we
don't _know_ what the other file systems do, and what problems they have; =
=20
we haven't audited the other code. This uncertainty is one of my big
concerns about things.

If we want to use shared locks for VOP_READ(), we should just document=20
that we're doing it, and figure out what needs to be changed to adjust.=20
Like how do we fix VOP_UPDATE(). Then we update the code and change how we=
=20
document the locking interface.

I agree it'd be nice to use shared locks in the vnode system. Reads can=20
benefit. I also think parts of lookup could benefit; I think it would be=20
fair to use shared locks for all lookups other than the last component.=20
However if we're going to start changing the locking protocol, we should=20
do it deliberately and with research. Not ad-hoc.

Oh, as an aside, I think in your patch you unlock at one point and then=20
re-lock with LK_SHARED. I think all you need to do is call VOP_LOCK() with=
=20
LK_DOWNGRADE or something like it - you can downgrade the exclusive lock=20
into a shared one. Also, since you have the exclusive lock, you can just=20
use VOP_LOCK() as no one will have initiated reclaiming in the mean time.

Take care,

Bill

--dDRMvlgZJXvWKvBx
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFAtDYEWz+3JHUci9cRAhreAJ966noXEabRPEFIQDnpLTnB+u/adwCfYKoG
SZd7BYqej2GDhRcODDiF4xo=
=mckx
-----END PGP SIGNATURE-----

--dDRMvlgZJXvWKvBx--