Subject: Re: direct I/O
To: Chuck Silvers <>
From: Darrin B.Jewell <>
List: tech-kern
Date: 03/03/2005 19:03:10
Chuck Silvers <> writes:
> of course, with direct I/O being intrinsically synchronous, that immediately
> leads to the question of how to allow concurrent reads and writes to a file.
> our current locking scheme (enforced above the VOP layer) doesn't have any
> concept of this.  I'm thinking about adding a range-locking implementation
> and having vn_{read,write}() use that for O_DIRECT requests where
> FOF_UPDATE_OFFSET is not set (ie. pread() and pwrite()).  this would involve
> some new VOPs (VOP_LOCK_RANGE(), VOP_UNLOCK_RANGE()) and those would need to
> interact appropriately with the existing non-ranged vnode locks.  file systems
> that do not implement these VOPs would just return an error and the calling
> layer would fall back on the current locking automatically.  there would
> be no syscall interface change for this.  I'm thinking it probably won't be
> much harder to do the range-locks than it was to do the direct I/O stuff.
> comments?

My first thought is that for internal kernel use, a separate
VOP_LOCK_RANGE call should not be necessary.  Instead, I was thinking
that the existing PG_BUSY locks could be relied on for individual
pages in a range.  I don't think range locking is necessary at this
level, since consistent results for concurrent access to a range is
not required as far as I am aware.  (Except in the case of
fcntl(F_SETLK), which is handled separately.)

In fact, I would like to relax vnode locks, so that they don't lock
out concurrent access to a file, and instead just protect the
integrity of filesystem metadata where necessary.

I'll also note that currently, i/o operations on the VCHR device are
already allowed to be concurrent because the vnode is unlocked in
spec_write when the underlying device write routine is called.  While
I think this is appropriate behavior, this leads to an existing race
condition in the physio uvm_vslock/uvm_vsunlock code which causes
diagnostic messages to be spewed from the i386 pmap_unwire routine.
If useful, I can reproduce a test case and submit a pr.