current-users: Re: New device buffer queue strategy

Subject: Re: New device buffer queue strategy
To: None <tls@rek.tjls.com>
From: Chris Jepeway <jepeway@blasted-heath.com>
List: current-users
Date: 09/02/2002 17:35:25
> Finally, I wonder if it might make sense to attempt to merge adjacent
> requests up to MAXPHYS at queue insert time.
I've got some code I wrote for a client that's called as

	bp = blk_cluster(sd->buf_queue, sector_size);

at queue *removal* time.  It gangs together adjacent buffers at the
front of the queue up to MAXPHYS in length.  It uses uvm_km_kmemalloc(),
vtophys() and pmap_kenter_pa() to cobble up a new buffer with a b_data
pointing to VA that maps all the b_data of the individual adjacent
buffers.  If the buffer at the head of the queue isn't adjacent to
the second buffer in the queue, blk_cluster() just returns the first buffer.

The b_iodone for cobbled buffers will biodone() all the buffers it
gangs together, much like the old cluster_save code that left when UBC
went in.

Doing the clustering at removal time instead of at insertion time
lets you know how much total VA you need and lets you request it
all at once.  If you cluster at insertion time, you've got to either
build up your VA incrementally, calling uvm_km_kmemalloc() more than
once, or just ask for MAXPHYS VA, which could be too greedy, particularly
if MAXPHYS goes dynamic.

> Given fixed (and substantial!)
> command overhead, simply reducing the number of I/O requests might help
> more than one might think, particularly with request sources such as
> RAIDframe that are known to produce smaller requests than the disks can
> handle.
This client has a proprietary filesystem that issues small-ish requests
a la RAIDframe.  blk_cluster() measurably improves that f/s's performance.
I don't have any RAIDframe stats, nor any for FFS, but when the proprietary
f/s uses clustering, it gets a 4X improvement in throughput on old-ish disks.
Take the 4X with a grain of salt, since the client's f/s doesn't do any
up-front clustering the way UBC does.

They'd like to release the code to the community if there's interest.
They're in a bit of a crunch at present, though, so it might take me
a week or perhaps more to get it cleared with them.  Adapting it to
-current with the new BUFQ_{PUT,GET}() should go quickly once they
OK the release.

If folk think there's a chance an interface like blk_cluster() might make
it into NetBSD proper, I'm willing to do the legwork to get this code
released for scrutiny by the world at large.  Once it's vetted and if
it's accepted, I suspect my client would fund some of my time to adapt
it as necessary for buy-back by the NetBSD group.

Let me know.

> Thor
Chris <jepeway@blasted-heath.com>.