Subject: more disksort fun
To: None <tech-kern@netbsd.org>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 02/04/2000 15:42:30
Hi folks...

I noticed another problem with disksort (both versions) today, although
it's not a show stopper, just a pessimization.

Disksort keys off b_blkno, which is the offset of the transfer relative
to the beginning of the device that's open.  However, for the vast
majority of cases, this means that it's a *partition-relative* offset.

What this means is that block a:10 comes after b:5, even though they're
not actually sequential, especially in the case of disksort_blkno().

While there's not chance of pooching data, this is obviously bad from
the "we'd like to do sequential writes as much as possible, and avoid
seeking back" perspective.

Note that the hp300 and pmax old-style SCSI code actually overloaded
b_cylinder with a raw block number and used cylinder disksort to avoid
this problem!  (I noticed that, actually, when I split the disksort
routines into cylinder/blkno and blkno-only verions...)

What I have done instead is add a b_rawblkno field to struct buf,
and change callers of disksort_*() to fill it in with the appropriate
non-partition-relative value, and change disksort_*() to use that as
the sort key, rather than b_blkno.

Thoughts?

        -- Jason R. Thorpe <thorpej@nas.nasa.gov>