Subject: Re: New device buffer queue strategy
To: Chuck Silvers <chuq@chuq.com>
From: Chris Jepeway <jepeway@blasted-heath.com>
List: tech-perform
Date: 09/04/2002 14:16:08
> as you allude to, the i/os sent from the FS/VM layer to the disk driver
> will already have clustering done,
Erm...only within a single file, right?

> so there's little to be gained
Hand waving argument I used to convince myself 
clustering at the driver level was worth checking
into even when UBC does clustering follows...

> by doing it again in the disk driver.
But, you wouldn't "do it again."  You'd "check it again,"
where the check is a fn-call, an extra BUFQ_PEEK() and a compare.
So, you don't pay much for checking in the case where
the f/s has done all the clustering possible.

And, if the f/s misses a cluster, then if it was good for
the f/s to cluster in the first place, you can posit
it would still help to catch that cluster and do it at
the disk level.

OK, done w/ the handwaving.  This is really a
"probably doesn't hurt, might help" argument.
As I said, this is what I used to justify
checking into the UBC case, too.  I don't
claim anybody should believe anything after
reading it.

> the exception to this
I think there are others:

	o  directory writes (at least, last I
	   checked UBC didn't handle dirs)

	o  clusters that span files, where
	   files in same dir/cylinder-group
	   are written/read by >1 process

So, it seemed like disk-level clustering would
help out servers some, even for UBC.  Please
set me straight if I've got s/t wrong.

Wait, dang it: writes to directories are serialized,
aren't they?  So, you couldn't deliver more than a single
struct buf to the driver at a time.  So, scratch that one.

> would be for layered disk drivers like raidframe,
> where non-contiguous chunks of i/o
> in the virtual device presented to the FS/VM layer
> can become contiguous for the underlying real devices.
> for such layered disk drivers, this re-clustering
> could potentially be useful.
Yup, that's what Thor was on about.
Me, I don't RAIDframe.

> I'd like to see some
> empirical evidence that such code helps before it went into the tree, though.
Coming soon to a website near you.

For test harnesses, I have

    o  an hpcmips that can scribble on an old laptop drive

    o  an i386 that can scribble on:

       .  a really old SCSI drive (10 yr-old Micropolis)
	  through a really old SCSI adapter (Adaptec 2903)

       .  an old SCSI drive (4 yr old Seagate)
	  through a sorta recent SCSI adapter
	  (Adaptec 39160)

I can measure non-RAIDframe FFS performance with them and report.
Is there someone who could check RAIDframe?

These are the test cases I'm thinking of:

	Case 0:		-current, the base case

	Case 1:		-current instrumented to count missed clusters

	Case 2:		-current with driver-level clustering

Case 1 gets a handle on the benefit disk-level clustering
*could* yield.  Checking 0 against 1 gives a notion of how believable
the "max benefit" really is: if the times are very different, we
can suppose that the instrumentation is interfering enough
that the "missed cluster count" is bogus.  Checking 0 against 2
is the test that matters, of course.   If 2 turns out to
be not so good, but the "missed cluster count" is both believable
and high, then the implemntation of 2 isn't good enough.

So, what are some preferred benchmarks for this sorta thing?

> -Chuck
Chris <jepeway@blasted-heath.com>.