Subject: Re: interactive responsiveness
To: Steve Bellovin <email@example.com>
From: Daniel Carosone <firstname.lastname@example.org>
Date: 02/03/2004 07:44:59
Content-Type: text/plain; charset=us-ascii
On Mon, Feb 02, 2004 at 09:03:00AM -0500, Steve Bellovin wrote:
> Using a kernel from Saturday, with NEW_BUFQ_STRATEGY set but otherwise=20
> default options (and in particular with default sysctl settings for=20
> vm.), I'm seeing *excellent* responsiveness. =20
That's excellent news. How old was your previous kernel, for comparison?
The factors in this issue are really complex, and it seems the
combination of these options that works best is different for
different people (and hardware).
One of those factors (the bufcache size) changed dramatically
recently, such that it certainly upset the previous balance initially,
and even after recent fixes it's worth everyone re-evaluating their
use of some of the other options.
For me, I found that NEW_BUFQ made the disk seek a LOT more, and jobs
which generated that disk traffic take correspondingly much longer,
with no real benefit to "interactive responsiveness" of other
processes. But then, I never really saw any of the widely commented
problems others had and which NEW_BUFQ was attempting to address. I
do use softdep, and I do have a reasonable amount of RAM in my laptop
Some other changes I was playing with, to spread the huge burst of
metadata write IO from the much-expanded buffer cache, had a similar
effect initially. They're working well enough now, but it's come down
to tuning with "magic numbers" that seem to work for me and are
probably no good for someone else.
I've become more and more convinced that one very large factor in this
issue is drive write-cache, and probably behavioural differences
between different drive manufacturer's firmware. My drive seems to do
a lot of write consolidation in cache, which absorbs bursts of
nearly-contiguous small metadata writes very nicely and seeks just a
few times to deposit that lot on platters. In fact, the only time I
do see any responsiveness jerkiness, I'm now convinced is because of
interface and cpu bottlenecks for command overheads sending thousands
of tiny writes, rather than from actually waiting for disk data. I now
spread that IO just enough to avoid that case, for my system, it
seems. If I spread too much, I seem to break the drive's consolidation
and force many many more seeks.
If I turn off write-cache, I can easily see that processes waiting for
a read could wait a *very* long time without NEW_BUFQ, *especially*
after recent buffercache changes let us issue huge numbers of writes
all at once from softdep.
Without that, my best results come from giving all the requests to the
disk roughly in order, and letting it sort out consolidation and
scheduling. Probably, another disk would give different results.
I've done some testing, but less extensive, on other machines - but
this laptop is what I use constantly and the easiest place for me to
assess the qualitative impact of any responsiveness results, certainly
over time as code changes have been made.
> I don't know what the changes were, but from my perspective, they're
> working just fine. (Note: I do not use softdep)
Does your disk have write-cache enabled? (atactl wd0 getcache) It
probably does unless you've taken steps to disable it.
So, my guess for an explanation to your improved results is that they
come from the fact that the buffercache can now cache much more
metadata, and that previously you weren't able to cache enough. You
were blocking on reads for this, and that's why NEW_BUFQ helped you.
Now, you're just not needing to issue those reads at all.
You may also have a drive that doesn't handle writes in the same way
mine does, or just have been cautious and disabled write-cache (as
well as softdep :) In this case, metadata writes will be slow, and you
probably want to continue to avoid softdep, because at the moment it
can create huge bursts of metadata writes after, say, a CVS update
(though fewer writes overall because it avoids rewrites). We're
working on that issue, but it's not there yet. Some potential future
work on consolidating these writes before sending them to the disk is
also likely to help a lot - though perhaps others more than me.
Without softdep, your metadata writes are synchronous, and you
probably want to continue using NEW_BUFQ; since you're going to be
seeking a lot regardless, you might as well seek to something that
will unblock a read first. But overall, that doesn't seem like a
happy place, compared to what I see.
With softdep and a drive like mine, my experience is that NEW_BUFQ is
not a win.
PS. My drive is an IBM 40G "travelstar":=20
wd0 at atabus0 drive 0: <IC25N040ATCS04-0>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 38154 MB, 77520 cyl, 16 head, 63 sec, 512 bytes/sect x 78140160 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA=
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (NetBSD)
-----END PGP SIGNATURE-----