Subject: Re: system tuning to improve responsiveness
To: theo borm <theo_nbsdhelp@borm.org>
From: Richard Rauch <rkr@olib.org>
List: netbsd-help
Date: 04/13/2005 22:13:46
The short answer is: Yes, I have been seeing the same thing.

I've seen it in xterm.  I've seen it in playing audio (or
video) with mplayer.  I see it when I move the mouse cursor
between windows and it often freezes as soon as it is in a
new window.  I see it when I record audio via audiorecord,
I believe, or when I am recording into Audacity.

Interestingly, xmms seems able to play sound files with less
(or no) problem.

I do not believe that it is caused by disk buffers.  My evidence
for this is that I can observe it in the game bzflag (from pkgsrc),
and during that, the hard drive light never comes on.  Once you
start, bzflag reads your config file, but unless you have to hit
swap, there should be no further disk interaction.  The game is
network oriented and does not load/reload anything from disk that
I know.  I see bzflag freeze up for about 1 second out of about
every 15 seconds.

Sometimes, a single window/process seems to be affected, but I
can move to another window while the one just sleeps for ~1 second.
Mouse and window manager seem to be unaffected in those cases.

I also can see the problem in running shell commands if I ssh
into the affected machine from an unaffected machine.  So it's
not just in X handing events up and down, etc.

The best way that I can summarize this bug is to say: Once in a
while (about once every 15 seconds in the case I have), a small
number of randomly-chosen processes (perhaps just 1) will be refused
access to the CPU for a "long" period of time (in computer's terms).

I have not been able to catch the system doing any unusual


This is easily observable by the user when interacting with
applications.  It can happen when there is no observable disk I/O.



My history with the problem:

I recall hearing about something similar, last summer on the port-amd64
list.  I hadn't (yet) run into it, and got the impression that it was
an issue that got introduced to the code-base in late summer or early
fall.  I guess (my memory of the chronology is a bit weak) I felt impelled
to upgrade my OS during that time---or perhaps I forgot about the
bug, or didn't worry about it.  Eventually, it sounded like it was
fixed.

I remember figuring that it would be fixed in a few weeks and the
next time I upgraded the OS, I'd be fine.

It wasn't a huge problem, so I ignored it for long while.  And, even
though I was "just" running -current, I depended on the machine enough
that I didn't feel comfortable doing a needless upgrade and possibly
being out of commission for a longer period.  So I held off upgrading
for a while.  (Then some changes were made to, e.g., ipf, such that
I couldn't upgrade the kernel and have the system work as I expected
without also upgrading userland...)

Finally, I upgraded again.  And the problem was still present.  I posted
to port-amd64, since I have only seen this on the AMD64 system, and
had only heard of others reporting it on AMD64 systems.  I seem to recall
that it was confirmed by someone as also affecting i386.


I'd say that I've seen this problem for the past approx. 8 months.
I recently built a 3.99.3 system and installed on a more or less
virgin hard drive.  I still see the problem.

I do not think that any of my systems except my (sole) AMD64 box
do this.  But only one other machine is running -current (an
i386 laptop) and I haven't updated it since about July when the
ne(4) ethernet interface was overhauled---and stopped working
for that machine's ethernet card.  I have an i386 box running
2.0, though, and it seems fine.

-- 
  "I probably don't know what I'm talking about."  http://www.olib.org/~rkr/