Subject: Re: write cache on ATA drives
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 12/09/2002 18:54:10
On Mon, Dec 09, 2002 at 11:07:38PM +0100, Manuel Bouyer wrote:
> > > 
> > > This is easy to do. Unfortunably, I don't think the filesystems will pass
> > > us this info ...
> > 
> > Of course they do.  The buffer's either B_ASYNC or it's not.  If it's 
> > B_ASYNC, you may either turn the write cache on or leave the write cache on,
> > depending on its previous setting.  If it is B_SYNC, you must turn the write
> > cache off or leave it off, depending on its previous setting.
> > 
> > This is basically the same thing we do in SCSI drivers with ordered vs.
> > simple tags, just in a much less elegant fashion because, well, IDE sucks.
> > 
> > Luckily, softdep allows us to eliminate the vast majority of synchronous
> > I/O, so that, in theory at least, it should be unnecessary to change the
> > cache state very often.  And NEW_BUFQ_STRATEGY might help some too, by
> > clustering reads (which are never synchronous) and writes (which sometimes
> > are) together more effectively, I think.
> 
> Is it also true for softdeps ?

I'll use "B_SYNC" here as shorthand for "Not B_ASYNC"

That's an interesting question.  Without softdep, B_SYNC means that that
particular I/O is one of the ones that must actually be committed to stable
storage before return, because it's a metadata write.  With softdep, it's
all async.  Is it actually the case that if "later" writes hit the disk but
"earlier" ones don't, because they were in the drive's cache, the 
filesystem's safe?  I don't think it is.  In any case, it's probably true
that this means that B_SYNC is not safe to use as an "implicit barrier"
with soft dependencies.

Tagged queueing is different, because so long as the I/O is async, it is
safe to use a simple tag -- thus letting the driver reorder it if the
queueing policy is set that way, which I think we should ensure it always
is, for performance reasons -- and not return completion to the caller
until the drive tells you the I/O's done.  You're right, SYNCHRONIZE CACHE
is not a suitable replacement here -- though it would be for any kind of
system for async metadata writes that actually used a log, because you
could segment the log and "commit" entries by synchronizing, which is
approximately what Microsoft does (and which would be similar to synchronizing
the cache or inserting an ordered tag for the last command of an LFS segment
write).

However, flushing an IDE drive's write cache on B_SYNC I/O is still not
_incorrect_ for the softdep case, and because there's not much sync I/O in
that case, it shouldn't make much difference.  It _will_ prevent filesystem
damage in the non-softdep case, so it still seems like a win.

Thor