Subject: Re: direct I/O again
To: Bill Studenmund <wrstuden@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 03/28/2006 08:27:09
On Mon, Mar 27, 2006 at 07:36:48PM -0800, Bill Studenmund wrote:
> Ok, I looked at the substance of it, and it seems fine.

great, thanks.


> I'm actually quite interested in your mentioning support for cuncurrent 
> writes. What did you have in mind for that?
> 
> The two things I see that are needed are:
> 
> 1) some way to handle keeping the writes out of each others' way. An 
> extent map comes to mind...

that would be a fine mechanism, except that it only operates on u_longs
and not off_ts.  also, the list-based implementation of this would likely
need to be changed to something more efficient.


> 2) how do we keep potential allocations out of each others' way? i.e. 
> fine-grained locking on changing the block allocation tables. ??

since the advent of the UBC work, file bmap information is protected by
the getpages lock (for file systems that use genfs_{get,put}pages, which
is currently all of the ones were our driver can modify file bmaps),
so even if writes are allowed in parallel because they are to different
ranges of a vnode, bmap modifications can still be serialized by taking
the getpages lock in exclusive mode.  in the longer term, it would be
desirable to use the buffer locks themselves (ie. B_BUSY, which could
eventually be changed to a shared/exclusive lock as well) to protect
portions of the bmap for file systems where that's possible, but there
are many other bottlenecks to address before that matters.


another possibility would be to add another flag to indicate that direct
I/O requests need not be serialized with each other, only with non-direct
requests.  pretty much any application that would actually use direct I/O
(eg. databases) would be happy with this, and it would allow us to use
the faster mechanism of just taking the vnode lock in a shared mode for
all direct requests (though a different shared mode than the normal one,
basically we would have two incompatible colors of shared mode).  this
particular scheme would unnecessarily serialize direct reads vs buffered
reads; this seems unlikely to be a problem in practice, but if desired we
could extend it to have four lock colors (the cross product of read vs. write
and POSIX vs. concurrent) with the appropriate color-compatibility matrix.
actually, this other flag (let's call it O_CONCURRENT for now) wouldn't
really need to be tied to O_DIRECT, though allowing concurrent writes for
buffered I/O would require a bunch of additional changes beyond those
that would be required for concurrent direct I/O.

-Chuck