Subject: Re: Interactive responsiveness under heavy I/O load
To: John Goerzen <jgoerzen@complete.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-perform
Date: 01/26/2004 11:58:45
On Mon, Jan 26, 2004 at 04:30:08PM +0000, John Goerzen wrote:
> Hello,
> 
> I have been noticing some disturbing patterns on two different machines
> I'm trying NetBSD on.  When I am untarring a file, or generally doing
> anything that is causing large amounts of data to be written at once,
> interactive performance is seriously degraded.  For instance, while I

The new I/O sorting algorithm in -current should make this significantly
better.  I am hoping that it can become the default for 2.0.

> It seems like when this problem occurs, an I/O scheduler somewhere is
> starving everything but the big writing process of resources.  But I
> have no idea if this is tweakable somewhere, or how to go about fixing
> it.

You say later that you're using softdep.  The likely problem is an
interaction of softdep and the questionable behaviour of the delayed
write scheduling code (the "smooth-sync" or "syncer") code that was
imported along with softdep.

The basic problem is that softdep allows an almost arbitrary number of
metadata operations (directory writes, allocation bitmap writes, etc.)
to be scheduled so long as your machine has sufficient metadata cache
buffers to hold them  (when it doesn't, some will be written out to
make space for others; this inherently paces the I/O, but the problem
still exists).  All delayed writes are put on a syncer "worklist"
corresponding to a particular second (in some cases, we process two
worklists per second, but usually only one, and never more than two).

Directory writes all get the same delay, "dirdelay", currently set to
15 seconds.  Other metadata writes get the delay "metadelay", currently
set to 20 seconds.

What that means, in practice, is that "dirdelay" (15) seconds after you
fire off all that heavy I/O, the syncer will try to complete it all in
a second.  This floods the disk queues and, in practice, can make the
system sluggish for _several_ seconds.  By prioritizing reads around
async writes, the new sorting algorithm can at least minimize the
effect of this probglem on interactive use.

I have experimented with introducing jitter into the scheduling of
directory I/O, with mixed results.  It is clear that without the new
disksort, it is a lose; with it, it _should_ be a win but the jury is
still out.

Another problem is that the current syncer keeps its worklist by _file_.
That means that there are some other degenerate cases, one or more of
which you may be seeing:

1) If all of your I/O is to a small number of very busy files, plain
   file I/O will bog the system down every "filedelay" (currently 30)
   seconds.

2) It is possible, though not likely, that I/O for non-directory
   metadata is generating enough load to make your system seem 
   sluggish.  Because this is all expressed to the syncer as I/O
   for a single vnode corresponding to the entire filesystem, there
   is nothing we can really do to make it flush out smoothly.

I'd appreciate it if someone else could read the code and confirm #2
but it's what I get, having looked at it a few times.  There is
ongoing work to address #1; I am actively working on the directory
I/O problem (rebalancing in the syncer itself now looks more
promising than spreading the I/O when it is originally scheduled).  In
any case, the new disksort should help a lot.

So there is a light at the end of the tunnel.  If you want immediate
relief, turning off softdep should make your system's interactive
performance more predictable, though it will probably make your
I/O itself slower.

Thor