Subject: Re: direct I/O again
To: Chuck Silvers <>
From: Bill Studenmund <>
List: tech-kern
Date: 03/28/2006 18:55:10
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Mar 28, 2006 at 08:27:09AM -0800, Chuck Silvers wrote:
> On Mon, Mar 27, 2006 at 07:36:48PM -0800, Bill Studenmund wrote:
> > Ok, I looked at the substance of it, and it seems fine.
> great, thanks.
> > I'm actually quite interested in your mentioning support for cuncurrent=
> > writes. What did you have in mind for that?
> >=20
> > The two things I see that are needed are:
> >=20
> > 1) some way to handle keeping the writes out of each others' way. An=20
> > extent map comes to mind...
> that would be a fine mechanism, except that it only operates on u_longs
> and not off_ts.  also, the list-based implementation of this would likely
> need to be changed to something more efficient.

Oh, I didn't have a specific implementation in mind. :-)

> > 2) how do we keep potential allocations out of each others' way? i.e.=
> > fine-grained locking on changing the block allocation tables. ??
> since the advent of the UBC work, file bmap information is protected by
> the getpages lock (for file systems that use genfs_{get,put}pages, which
> is currently all of the ones were our driver can modify file bmaps),
> so even if writes are allowed in parallel because they are to different
> ranges of a vnode, bmap modifications can still be serialized by taking
> the getpages lock in exclusive mode.  in the longer term, it would be
> desirable to use the buffer locks themselves (ie. B_BUSY, which could
> eventually be changed to a shared/exclusive lock as well) to protect
> portions of the bmap for file systems where that's possible, but there
> are many other bottlenecks to address before that matters.

That's good to know! Protecting block allocation is not something I think=
I would be up to doing. :-)

> another possibility would be to add another flag to indicate that direct
> I/O requests need not be serialized with each other, only with non-direct
> requests.  pretty much any application that would actually use direct I/O
> (eg. databases) would be happy with this, and it would allow us to use
> the faster mechanism of just taking the vnode lock in a shared mode for
> all direct requests (though a different shared mode than the normal one,
> basically we would have two incompatible colors of shared mode).  this
> particular scheme would unnecessarily serialize direct reads vs buffered
> reads; this seems unlikely to be a problem in practice, but if desired we
> could extend it to have four lock colors (the cross product of read vs. w=
> and POSIX vs. concurrent) with the appropriate color-compatibility matrix.
> actually, this other flag (let's call it O_CONCURRENT for now) wouldn't
> really need to be tied to O_DIRECT, though allowing concurrent writes for
> buffered I/O would require a bunch of additional changes beyond those
> that would be required for concurrent direct I/O.

To be honest, I think that as long as i/os don't overlap, we're fine. I=20
think it's ok for an O_DIRECT write to be happening at the same time as a=
non-direct read, assuming they cover different parts of the file.

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.3 (NetBSD)