Subject: Re: write cache on ATA drives
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 12/09/2002 16:54:39
On Mon, Dec 09, 2002 at 10:20:21PM +0100, Manuel Bouyer wrote:
> On Thu, Dec 05, 2002 at 06:25:33PM -0500, Thor Lancelot Simon wrote:
> > What Windows does, evidently, is to disable and then re-enable write-back
> > caching as barriers around performing a synchronous I/O to the drive.  This
> > forces a cache flush, so you know the sync I/O went out, while letting the
> > drive continue to reorder async I/O to normal files, retaining much of the
> > performance benefit.  With softdep, you'd retain most of the rest -- as
> > much as you could _safely_ retain, anyway.
> 
> This is easy to do. Unfortunably, I don't think the filesystems will pass
> us this info ...

Of course they do.  The buffer's either B_ASYNC or it's not.  If it's 
B_ASYNC, you may either turn the write cache on or leave the write cache on,
depending on its previous setting.  If it is B_SYNC, you must turn the write
cache off or leave it off, depending on its previous setting.

This is basically the same thing we do in SCSI drivers with ordered vs.
simple tags, just in a much less elegant fashion because, well, IDE sucks.

Luckily, softdep allows us to eliminate the vast majority of synchronous
I/O, so that, in theory at least, it should be unnecessary to change the
cache state very often.  And NEW_BUFQ_STRATEGY might help some too, by
clustering reads (which are never synchronous) and writes (which sometimes
are) together more effectively, I think.

Interestingly, this was the original genesis of B_ORDERED: with an explicit
barrier operation, it's easier to force something like a cache flush (which
is what sending a single ordered tag is equivalent to when there are many
outstanding simple tags on a SCSI device, essentially) only when you want or
need one.  What started Jason and I thinking about it was the question "what
would you need to do to have arbitrary I/O reordering while ensuring that
LFS segment writes were safe"?  In any case, Microsoft's example would seem
to teach that it's possible to simply use any synchronous I/O as the barrier
that flushes the cache, for some filesystems at least.  I'd be curious to
know what impact this had on FFS performance.

There was some speculation that Microsoft actually had some way to make
_single commands_ write-through on IDE devices, avoiding the full cache
flush, but IIRC a Microsoft employee popped up in comp.arch.storage last
time this was being discussed and denied that...

Thor