Subject: Re: New device buffer queue strategy
To: Chuck Silvers <chuq@chuq.com>
From: Chris Jepeway <jepeway@blasted-heath.com>
List: tech-perform
Date: 09/04/2002 14:16:08
> as you allude to, the i/os sent from the FS/VM layer to the disk driver
> will already have clustering done,
Erm...only within a single file, right?
> so there's little to be gained
Hand waving argument I used to convince myself
clustering at the driver level was worth checking
into even when UBC does clustering follows...
> by doing it again in the disk driver.
But, you wouldn't "do it again." You'd "check it again,"
where the check is a fn-call, an extra BUFQ_PEEK() and a compare.
So, you don't pay much for checking in the case where
the f/s has done all the clustering possible.
And, if the f/s misses a cluster, then if it was good for
the f/s to cluster in the first place, you can posit
it would still help to catch that cluster and do it at
the disk level.
OK, done w/ the handwaving. This is really a
"probably doesn't hurt, might help" argument.
As I said, this is what I used to justify
checking into the UBC case, too. I don't
claim anybody should believe anything after
reading it.
> the exception to this
I think there are others:
o directory writes (at least, last I
checked UBC didn't handle dirs)
o clusters that span files, where
files in same dir/cylinder-group
are written/read by >1 process
So, it seemed like disk-level clustering would
help out servers some, even for UBC. Please
set me straight if I've got s/t wrong.
Wait, dang it: writes to directories are serialized,
aren't they? So, you couldn't deliver more than a single
struct buf to the driver at a time. So, scratch that one.
> would be for layered disk drivers like raidframe,
> where non-contiguous chunks of i/o
> in the virtual device presented to the FS/VM layer
> can become contiguous for the underlying real devices.
> for such layered disk drivers, this re-clustering
> could potentially be useful.
Yup, that's what Thor was on about.
Me, I don't RAIDframe.
> I'd like to see some
> empirical evidence that such code helps before it went into the tree, though.
Coming soon to a website near you.
For test harnesses, I have
o an hpcmips that can scribble on an old laptop drive
o an i386 that can scribble on:
. a really old SCSI drive (10 yr-old Micropolis)
through a really old SCSI adapter (Adaptec 2903)
. an old SCSI drive (4 yr old Seagate)
through a sorta recent SCSI adapter
(Adaptec 39160)
I can measure non-RAIDframe FFS performance with them and report.
Is there someone who could check RAIDframe?
These are the test cases I'm thinking of:
Case 0: -current, the base case
Case 1: -current instrumented to count missed clusters
Case 2: -current with driver-level clustering
Case 1 gets a handle on the benefit disk-level clustering
*could* yield. Checking 0 against 1 gives a notion of how believable
the "max benefit" really is: if the times are very different, we
can suppose that the instrumentation is interfering enough
that the "missed cluster count" is bogus. Checking 0 against 2
is the test that matters, of course. If 2 turns out to
be not so good, but the "missed cluster count" is both believable
and high, then the implemntation of 2 isn't good enough.
So, what are some preferred benchmarks for this sorta thing?
> -Chuck
Chris <jepeway@blasted-heath.com>.