tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: B_ARRIER (addresses wapbl performance?)

On Tue, Dec 09, 2008 at 02:33:12PM -0500, der Mouse wrote:
> > For the journal we only want "written to stable storage before
> > written the meta data blocks back".  Isn't that what ordered tags are
> > supposed to provide?
> Is it?  I'd expect them to give that only when combined with FUA.
> Without that, I'd expect them to give "written to the cache before the
> metadata blocks are", with flushing from the cache to the underlying
> medium being a separate issue.

The semantics are defined by the definition of the cache control page for
direct access devices.  If the write cache enable bit isn't set, then
commands are simply not supposed to complete until the data are on oxide.

The point of using both simple and ordered tags is that -- used
appropriately -- the host should be able to allow the drive to reorder
simple-tagged commands, effectively gaining the write-buffering benefit
of the drive's cache memory, while not actually allowing it to complete
commands from the perspective of the SCSI bus before they are completed
by the more strict definition of "bits are on oxide".

Note that the default tag reordering scheme isn't supposed to reorder
even simple-tagged commands, but if they are already sorted, the simple
fact that the drive can complete many at once while new commands are still
being submitted will still give most of the benefit of write caching
without requiring WCE to be set, which is what causes problems requiring
FUA in the first place.  Other tag reordering schemes let the drive sort
simple-tagged requests with respect to the head position etc. while still
treating ordered tags as barriers.

SCSI disks generally *do not* ship with WCE turned on, because with sane
host OSes there is little reason to do so.  Our I/O subsystem is only
partially sane in this sense.

And with WCE turned off, the drive isn't supposed to report commands as
complete if the bits aren't on stable storage.

The question really is, it seems to me, do we want "force this command
to oxide now" _unconditionally_ or do we want "force this command to
oxide before you let any previous commands hit oxide".  The latter seems
much more elegant and flexible and also as if it is what consistency of
the on-disk datastructures of the filesystem _should_ require, while the
former seems stricter than what should be required, but as Bill pointed
out, using only simple and ordered tags and the default tag reordering
policy, the barrier this creates can end up actually impacting far more
I/O than may be intended.

Thor Lancelot Simon                               
    "Even experienced UNIX users occasionally enter rm *.* at the UNIX
     prompt only to realize too late that they have removed the wrong
     segment of the directory structure." - Microsoft WSS whitepaper

Home | Main Index | Thread Index | Old Index