Subject: Re: UBC status
To: None <>
From: Thor Lancelot Simon <>
List: tech-kern
Date: 09/27/1999 12:54:21
On Mon, Sep 27, 1999 at 09:25:06AM -0700, Matthew Jacob wrote:
> On Mon, 27 Sep 1999, Thor Lancelot Simon wrote:
> > On Mon, Sep 27, 1999 at 08:45:20AM -0700, Matthew Jacob wrote:
> > > > 
> > > > > Solaris has a B_ORDERED flag which is a hint specifically for this type
> > > > > of thing.
> > > > > 
> > > > > Basically, metadata writes (and any other writes that need to be ordered)
> > > > > would set B_ORDERED, and this would affect disksort() (for disks which
> > > > > don't supported tagged commands) and the type of tag used (for disks that do).
> > > > > 
> > > > > A B_URGENT (Head-Of-Queue tag) flag might also be useful.
> > > > 
> > > > I don't believe that any queue types other than simple queue are used,
> > > > primarily due to buggy implementations of other tag types in target
> > > > devices.  You must be thinking of some other OS.
> > > 
> > > Solaris uses FLAG_STAG, yes. Kleiman originally wanted B_ORDERED to be
> > > propagated to the driver so that a FLAG_OTAG would then be used to force
> > > all the previous STAG operations out. I guess this didn't get followed up
> > > on.
> > 
> > What is the point of having a B_ORDERED flag if the target device is,
> > ultimately, allowed not to treat it as a barrier?
> Because the *filesystem* or the *driver* can do the barrier.

How, by waiting for all the previous writes to complete, at the drive's
leisure?  Even that seems prone to failure: if you look at the Linux 
BusLogic driver you'll find a comment about issuing an ordered tag every 30
seconds because the author had encountered drives which *never* wrote some
simple tags otherwise (it's been suggested to me that this isn't legal 
default behaviour, but might be triggered by certain mode page settings on 
some popular drives).

In any case this seems... highly suboptimal.

ISTM the correct simpleminded solution is to read using simple tags and 
write using ordered tags; a slightly more sophisticated strategy would
be to use simple tags for B_ASYNC buffers and ordered tags for other
(sync) buffers, and the best is probably what we're talking about, 
adding B_ORDERED and propagating it down to the driver.  But actually, 
I'm curious.  Instinctively, I think the middle solution is correct
but might not perform well (you might want to enforce an ordering 
constraint without waiting, for example for the first buffer of an LFS
segment) but some people have suggested to me in the past that it
doesn't actually preserve the old FFS metadata ordering semantics.  I
don't see why not -- and, actually, I can't see a whole lot of examples
where it's not just as good as adding B_ORDERED, and a lot less work.  
Can someone give me an example of where it either doesn't work or doesn't 
work as well as B_ORDERED would?