Subject: Re: b_actf/b_actb -> TAILQ, plus B_ORDERED ... diffs
To: None <mjacob@feral.com>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 01/18/2000 15:01:49
On Tue, 18 Jan 2000 13:23:29 -0800 (PST) 
 Matthew Jacob <mjacob@feral.com> wrote:

 > [ should be on tech-kern? ]

Originally, I had only solicited review from a smaller group.  Since there
are issues to discuss, I have moved the discussion here.  Thus, I am leaving
as much of my original mail quoted as possible.

 > On Tue, 18 Jan 2000, Jason Thorpe wrote:
 > 
 > > Hi folks...
 > > 
 > > I have made some changes to struct buf and disksort(), which are meeded
 > > for the thorpej_scsipi branch.  Specifically, this adds a B_ORDERED flag
 > > so that users of buf I/O can specify a barrier (this will be used to issue
 > > ORDERED_TAG messages).  B_ORDERED also forms a barrier for disksort(); the
 > > elevator sort begins as if the last such request were the head of the queue.
 > 
 > If you use a B_ORDERED and assume this is for devices that can do I/O
 > re-ordering, why use disksort at all?
 > 
 > Mind you, I think I'm totally fine with what you're suggesting in adding a
 > B_ORDERED flag, but it defeats the purpose of having devices which can do
 > their own sorting if you call disksort and spend time sorting for them. In
 > fact, it can make things a lot worse.

I'm getting to that :-)  This is a first step.  Note that drivers have
the option of calling disksort() or not.  However, in the case of e.g.
floppies (we certainly don't want those to get any *slower*) or older
disks for which disksort() will be beneficial.  Also, for ZBR disks
without tagged queueing, a simple sort based purely on block number
could be beneficial, as the disk will see it as "sequential writes".

After tossing some ideas around w/ Mycroft, we figured a pointer to
a sort function in a disk's "struct disk" would work.  You'd have
a few options:

	- cylinder/block elevator sort (what we have now)
	- block-only sort which generates forward-only sequential writes
	  (like FreeBSD's bufqdisksort())
	- insert-at-end-of-queue

In all cases, B_ORDERED implements the barrier.

B_ORDERED is intended to be an abstract way of specifying the barrier,
for the benefit of file systems, which otherwise couldn't care less if
the disk supports ordered queue tags, is an SMD on Xylogics 753 (which has
its own on-controller sorting stuff), or an old Massbus disk (for which the
elevator sort is quite useful).

Also, from a discussion I had w/ Christoph Badura, it could be useful
to experiment w/ the latter two queueing schemes above on different
ordered-tag-capable disks.

        -- Jason R. Thorpe <thorpej@nas.nasa.gov>