Subject: Re: CVS commit: src
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 06/24/2004 20:11:02
--+sHJum3is6Tsg7/J
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jun 25, 2004 at 11:17:28AM +0900, YAMAMOTO Takashi wrote:
> > I'm not hearing, "I want to do X, but can't because of Y so I
> > want to change Y to Z." Those comments don't sound like the right way to
> > start out changing things.
>=20
> lfs sometimes need to wait for cleaning before starting
> an actual operation.  it can't be done cleanly,
> because vnode lock is already held by upper layer.
>=20
> it's better to allow several operations for a directry concurrently.
> esp. for remote filesystems like nfs.  it's not possible with mandatory
> vnode locks.
>=20
> so i want to change vnode locks (back) to advisory.
> or migrate to filesystem internal locks completely.
> (actually i don't want to do such changes for now.
> i'm just talking about the ideal world.)
>=20
> i thought that i said all of the above somewhere.
> sorry if it was not clear to you.

The what you want you'd said. The why had eluded (and from its absence,=20
concerned) me. Thank you.

> > If we didn't do the locking externally, we pretty much lock as soon as =
the=20
> > write starts, and unlock before exiting. Seems about the same to me. Ye=
s,=20
> > we spend a few cycles fewer w/o the lock, but I don't think that, in an=
d=20
> > of itself, is much of a difference.
>=20
> consider synchronous writes.
> you can implement a simple VOP_WRITE as the following.
> 	1. copy data to page cache and update some incore data.
> 	2. start disk i/o.
> 	3. wait for the i/o completion.
> for this filesystem, you don't likely want to prevent VOP_READ
> during 2 and 3.

You are probably right. I'm not sure if there ever were circumstances
where you'd want to wait. There may be, but I bet they'd be the rare case. =
=20
Note also that you'd probably also be permitting writes while you're=20
waiting.

In terms of getting to this, I think I'd much prefer moving to internal=20
locking. One of the bigest problems I encountered when getting layered=20
file systems working was ambiguity in what was was happeneing with=20
locking. So I pushed everything to use exclusive real locks; to be=20
consistent (and a bit facist). If we go to making locks advisory, I'm=20
concerned that things would get ambiguous again, and we're back in a mess.=
=20

I'd actually much rather massive rototillage, in the expectation that we'd=
=20
be changing so much that we'd have to pay attention to what we're doing,=20
and get it right.

Ironically, the next few months are the best time to do it. Something like=
=20
this would be a major change, and could postpone the release after 2.0. So=
=20
if it's going to happen, we're approaching one of the best windows to do=20
it in. ;-)

Anyone know how SystemV kernels handle this?

As a transition step, file systems could be changed to just add lock calls=
=20
at the start of routines that now take unlocked vnodes, and unlocks at the=
=20
bottom. So we change the interface, the callers, layered file systems, but=
=20
not the leaf file systems. i.e. we're more likely to get things right. :-)

Take care,

Bill

--+sHJum3is6Tsg7/J
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFA25fGWz+3JHUci9cRAjHOAJwP9dTK2/9JDt1jXFv8mrPKB5qz4gCfZDar
JHws3R2Z/SbkQKbu/mKMoqg=
=At14
-----END PGP SIGNATURE-----

--+sHJum3is6Tsg7/J--