Subject: Re: Extension of fsync_range() to permit forcing disk cache flushing
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: James Chacon <jmc@NetBSD.org>
List: tech-kern
Date: 12/17/2004 18:40:25
On Fri, Dec 17, 2004 at 07:53:59AM -0500, der Mouse wrote:
> >> I'm not really happy with this.  IMHO this should be the default
> >> behavior of fsync(). A programmer using fsync in his software uses
> >> it to make use data is on stable storage.
> 
> I agree.  This is (and always has been, as far as I can recall)
> fsync()'s contract.
> 
> > The problem is that syncing 1K of data from one file could cause an
> > entire 8MB cache to be written back (in fact, on an IDE disk, *will*
> > cause that).  Some applications "defensively" call fsync on every
> > write; think what that will do to overall system performance.
> 
> So, an application that is excessively conservative can interact with
> hardware which has insufficiently powerful interfaces to produce bad
> performance.  This is nothing new, nor do I see it as a problem.  If
> I'd been aware that fsync() had been violating its interface contract
> by just pushing data to volatile drive caches before this discussion, I
> would have filed a bug report for it.
> 
> Nor do I see any alternative, given a drive with no cache flush
> granularity better than all-or-nothing.  Except for violating fsync()'s
> interface contract.

What do you consider to be the "contract" in this case? The man page alone,
standards conformance, POLA?

For instance, SUSE3 states pretty clearly in the rationale section of
fsync that is _POSIX_SYNCHRONIZED_IO is not defined that it's pretty much
up to the documentation to spell out what fsync can/cannot do. It then
goes on to spell out that an implementation such as ours (which can't
guarentee absolutely due to caching) is conformant as long as we have a way
to force it.

"In the middle ground between these extremes, fsync() might or might not
actually cause data to be written where it is safe from a power failure. The
conformance document should identify at least one configuration exists (and
how to obtain that configuration) where this can be assured for at least some
files that the user can select to use for critical data."

i.e. The only bugs I really see in our implementation are not documenting
fsync's non-guarentee if you're using a caching device and how to work
around that (i.e. don't enable caching). Past that, fsync isn't some catch
all for providing high availablity. You still have to plan out systems
correctly to get there too.

Anyways, you'll still lose data on a power loss regardless of how many
fsync calls you make just due to things like partially written sectors
occuring. 

Granted...people should remember that fsync only lives up to it's "contract"
on a successful return. I doubt under most power loss scenarios it's
returning...

James