Subject: Re: Extension of fsync_range() to permit forcing disk cache flushing
To: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 12/20/2004 21:38:31
On Mon, Dec 20, 2004 at 11:33:43AM -0500, Bill Sommerfeld wrote:
> On Fri, 2004-12-17 at 08:34, Manuel Bouyer wrote:
> > In another thread, we admit that the upper layers needs to aware of this
> > property of the ATA drives, and deal with it. fsync() doens't have to
> > call directly the flush cache ioctl, it could insert itself in the
> > write queue with a write barrier. This way several subsystem's
> > barrier could be combined in one to help performances on busy systems.
> 
> while a queued barrier can help with single-system self-consistency, you
> still may wind up turning back the clock after a crash unless you also 
> put other externally-visible signs from the processes into limbo until 
> the write completes..
> 
> consider an SMTP server -- before responding with a 2xx code to the DATA command,
> it should commit the message to stable store. 

Yes, and as far as I know, sendmail uses fsync() for this.

> 
> all of this is, however, something of a probability game; there are no absolute
> guarantees that the data is coming back, because any block written could go bad,
> or the controller could go fail in a way that it would acknowledge writes as 
> complete even when they weren't (I've seen it happen.  story for another time..)

Sure. There can also be software bugs ...

> 
> What makes sense for a low end ATA drive (it may lose dirty cache data on
> loss of power or reset) may not make sense for a high end RAID system with 
> battery backed mirrored cache.
> 
> which fsync semantic do you want:
> 
>   - data recoverable even if something outside the storage widget fails
>   - data recoverable unless something inside the storage widget fails.

I really depends on the context. Personally on boxes where reliability
of the disk system matters, I don't use ATA at all.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--