Subject: Re: direct I/O
To: Chuck Silvers <chuq@chuq.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 03/03/2005 13:27:43
--p4qYPpj5QlsIQJ0K
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Feb 28, 2005 at 12:10:36AM -0800, Chuck Silvers wrote:
>=20
> of course, with direct I/O being intrinsically synchronous, that immediat=
ely
> leads to the question of how to allow concurrent reads and writes to a fi=
le.
> our current locking scheme (enforced above the VOP layer) doesn't have any
> concept of this.  I'm thinking about adding a range-locking implementation
> and having vn_{read,write}() use that for O_DIRECT requests where
> FOF_UPDATE_OFFSET is not set (ie. pread() and pwrite()).  this would invo=
lve
> some new VOPs (VOP_LOCK_RANGE(), VOP_UNLOCK_RANGE()) and those would need=
 to
> interact appropriately with the existing non-ranged vnode locks.  file sy=
stems
> that do not implement these VOPs would just return an error and the calli=
ng
> layer would fall back on the current locking automatically.  there would
> be no syscall interface change for this.  I'm thinking it probably won't =
be
> much harder to do the range-locks than it was to do the direct I/O stuff.

I think I'd rather push this into the file system. I agree with the=20
discussions we've had that in the long run it'd be nice to move to a=20
different vnode locking scheme, and as part of that VOP_READ() and=20
VOP_WRITE() calls would be performed w/o holding the vnode lock. In that=20
case, the fs has to do all the locking internally. But I think it'd be=20
easy to handle something like you describe here: on entry, VOP_READ() or=20
VOP_WRITE() grabs some sort of range lock for its i/o, performs it, then=20
releases the range lock. For READ, we let the locks be shared and for=20
WRITE, exclusive.

The problem with VOP_LOCK_RANGE is, at least as it pops into my head, that=
=20
the file system still needs to do locking internally to protect metadata=20
structures. Before that was protected by the vnode lock, but now (with=20
either option), we can have multiple writes operating on the same file at=
=20
once. So the fs will have to be savy. Consider the case of two pwrite()s=20
to a sparse file, with each one of them allocating blocks. We need to make=
=20
sure both of those operations work right. :-)

Well, my thought is that once we have fixed it so that the file system can=
=20
deal with two write operations happening at once, we can just let=20
VOP_WRITE() not need a lock and thus we won't need the _RANGE calls. :-)

Take care,

Bill

--p4qYPpj5QlsIQJ0K
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFCJ4FPWz+3JHUci9cRAraOAJ0TnzZIip4GM/QpMmsAxLP4ASJfAwCfWKB9
/GyNeR/5SiM2YDC6wzit4qc=
=5PYG
-----END PGP SIGNATURE-----

--p4qYPpj5QlsIQJ0K--