Subject: Re: Extension of fsync_range() to permit forcing disk cache flushing
To: Bill Sommerfeld <firstname.lastname@example.org>
From: Manuel Bouyer <email@example.com>
Date: 12/20/2004 21:38:31
On Mon, Dec 20, 2004 at 11:33:43AM -0500, Bill Sommerfeld wrote:
> On Fri, 2004-12-17 at 08:34, Manuel Bouyer wrote:
> > In another thread, we admit that the upper layers needs to aware of this
> > property of the ATA drives, and deal with it. fsync() doens't have to
> > call directly the flush cache ioctl, it could insert itself in the
> > write queue with a write barrier. This way several subsystem's
> > barrier could be combined in one to help performances on busy systems.
> while a queued barrier can help with single-system self-consistency, you
> still may wind up turning back the clock after a crash unless you also
> put other externally-visible signs from the processes into limbo until
> the write completes..
> consider an SMTP server -- before responding with a 2xx code to the DATA command,
> it should commit the message to stable store.
Yes, and as far as I know, sendmail uses fsync() for this.
> all of this is, however, something of a probability game; there are no absolute
> guarantees that the data is coming back, because any block written could go bad,
> or the controller could go fail in a way that it would acknowledge writes as
> complete even when they weren't (I've seen it happen. story for another time..)
Sure. There can also be software bugs ...
> What makes sense for a low end ATA drive (it may lose dirty cache data on
> loss of power or reset) may not make sense for a high end RAID system with
> battery backed mirrored cache.
> which fsync semantic do you want:
> - data recoverable even if something outside the storage widget fails
> - data recoverable unless something inside the storage widget fails.
I really depends on the context. Personally on boxes where reliability
of the disk system matters, I don't use ATA at all.
Manuel Bouyer <firstname.lastname@example.org>
NetBSD: 26 ans d'experience feront toujours la difference