Subject: Re: direct I/O again
To: Gordon Waidhofer <gww@traakan.com>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 03/31/2006 06:48:21
On Tue, Mar 28, 2006 at 07:31:12PM -0800, Gordon Waidhofer wrote:
> 
> >
> > To be honest, I think that as long as i/os don't overlap, we're fine. I
> > think it's ok for an O_DIRECT write to be happening at the same time as a
> > non-direct read, assuming they cover different parts of the file.
> >
> 
> It's very difficult to define "the right thing" if they do overlap.

POSIX defines "the right thing".  the last version of SUS is apparently
different, but previous versions did specify atomicity requirements for
read() and write().


> If the application layer isn't taking care of such races, arbitrary
> rules in the kernel or (the|a) file system won't save them.
> 
> Consider stat(2). Sometime during the processing the vnode lock
> is released, struct st copied out to userland, and control returned
> to userland. The result is best described that at some time between
> when stat() was called and stat() returned the attributes looked
> like this. They could -- although rarely -- change between the
> vnode lock being released and the return of control to userland.
> There is non-zero probability that the struct st in userland is
> stale before control is returned to the application, let alone
> whether it's accurate when acted upon (like ls(1) printing).

the issue we're discussing is read() and write() serialization.
yes, the info returned by stat() can become stale before an application
gets a chance to look at it, but that's not relevant for the
topic at hand.


> These races simply can't be overcome below the syscall level.
> 
> In short, nobody should worry about serializing intersecting
> direct reads/writes (or non-direct for that matter) too much.
> Don't worry, be happy.

some applications rely on this serialization, so it does seem
appropriate to worry about it.

-Chuck