Subject: Re: Extension of fsync_range() to permit forcing disk cache
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: tech-kern
Date: 12/20/2004 11:33:43
On Fri, 2004-12-17 at 08:34, Manuel Bouyer wrote:
> In another thread, we admit that the upper layers needs to aware of this
> property of the ATA drives, and deal with it. fsync() doens't have to
> call directly the flush cache ioctl, it could insert itself in the
> write queue with a write barrier. This way several subsystem's
> barrier could be combined in one to help performances on busy systems.

while a queued barrier can help with single-system self-consistency, you
still may wind up turning back the clock after a crash unless you also 
put other externally-visible signs from the processes into limbo until 
the write completes..

consider an SMTP server -- before responding with a 2xx code to the DATA command,
it should commit the message to stable store. 

all of this is, however, something of a probability game; there are no absolute
guarantees that the data is coming back, because any block written could go bad,
or the controller could go fail in a way that it would acknowledge writes as 
complete even when they weren't (I've seen it happen.  story for another time..)

What makes sense for a low end ATA drive (it may lose dirty cache data on
loss of power or reset) may not make sense for a high end RAID system with 
battery backed mirrored cache.

which fsync semantic do you want:

  - data recoverable even if something outside the storage widget fails
  - data recoverable unless something inside the storage widget fails.



						- Bill