tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: B_ARRIER (addresses wapbl performance?)



On Mon, Dec 08, 2008 at 07:24:24PM -0800, Jason Thorpe wrote:
>
> On Oct 29, 2008, at 2:51 PM, Thor Lancelot Simon wrote:
>
>> In practice, this would mean using an ordered tag with SCSI disks
>> (because ordered tags do not complete until all prior simple-tagged
>> commands complete)
>
> Just because a *command* (not a *write*, necessarily) has completed  
> doesn't mean it's on oxide.  If you really want to force the simple- 
> tagged writes before the ordered-tag write to be on oxide, your  
> ordered-tag command should be a SYNCHRONIZE CACHE, period (and then you 
> need to hope that your RAID vendor actually supports the command,  
> *sigh*).

(Typically) for the scenarios discussed here,  you care that it is on
stable reliable storage, which might be suitable nvram.  Sun got
themselves into a lot of trouble with ZFS over differing vendors'
interpretations of cache flushing, and bad performance implications of
doing too much of it..  There were some rather strongly-expressed
opinions in those threads too. 

A general comment from reading this thread a while ago, now that it
has been awoken again:  people seem to be arguing at cross
purposes.  

There are two features being discussed and positions being taken, but
perhaps participants are missing that they're complementary:

   One side argues that using FUA for log writes is essential, because
   for those you want to know the log is written, without undue
   overhead from flushing other pending writes.

   The other side argues that using ordered tags is essential to know
   that some set of writes has completed before some other set of
   writes are processed, and likewise that the completion of the
   latter also reliably implies the completion of the former.

Nobody seems to have pointed out that they work together: 

   log writes should be done with FUA, for minimal-latency commitment
   of transactions, 

   while ordered completion of other data+metadata writes is vital for
   updating the filesystem proper, and so knowing when the log tail
   can be freed again.  

Cache flushing is a poor substitute for either; its proper uses are in
other cases (like when you want to export disks from an array and
separate them from the controller cache, or when you're writing a
cluster filesystem and the controller cache is local to each host). 

--
Dan.

Attachment: pgpaQA6pEUm2p.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index