Subject: Re: system tuning to improve responsiveness
To: None <netbsd-help@NetBSD.org>
From: theo borm <theo_nbsdhelp@borm.org>
List: netbsd-help
Date: 04/13/2005 16:45:52
Timo Schoeler wrote:

>> Dear list members,
>>
>> I regulary use one of several different NetBSD2.0/i386 desktop
>> machines, and have sometimes noticed an intermittent "non-
>> responsivenes", which has started to anoy me. Sometimes my
>> mouse will "just stop moving", sometimes things will become
>> "sluggish" for several seconds. This does even happen when no
>> swap is being used, so a general "lack of memory" does not seem
>> to be the issue.
>>
<snip>

>>
>> Though it may be unrelated to the "non-responsiveness" issue,
>> a little test program writing 8192 files of 1MB each reveals
>> that there is a certain "spikiness" in the timing of writing each
>> 1MB file. In particular at ~30 second intervals, it can suddenly
>> take >2 seconds to write a file instead of the (more usual)
>> < 0.1 seconds. What could this be, and can this be avoided by
>> proper tuning?
>>
<snip>:

>
> i'm setting up a Sun Ultra 10 and made it run under heavy load to test 
> it's reliability under NetBSD ('sleep forever' seems solved, but i'm 
> suspicious ;).
>
> because i already had four shells open on the machine i was working 
> from on the U10, i ping'ed the U10 in one shell constantly.
>
> surprise (unfortunately i don't have the numbers, may repeat it on 
> demand): although nearly every ping was replied to within 0.6 to 0.7 
> ms (it's connected to a hub/repeater), a few ones took 5ms or even 
> longer.
>
> i think that's not related to my network, i could test this on a fully 
> switched (on a managed switch) segment to exclude this.
>
> concluding: i think even the IP stack suffers from the phenomenon you 
> described.

The problem with ping replies taking too long can have several
other - more likely - causes. Collisions for instance. In the past
I've had some networking problems, but I dont think thats
the case now.

The phenomenon I'm seeing is very intermittent - it can
(spontaneously) happen twice a minute or once a day, and
there are a two orders of magnitude difference between a
nearly imperceptible (from my user interface point of
view ;-) ) 5 ms and half a second delay (very perceptible
when moving your mouse).

The test case I described does not nescessarily have anything
to do with the spontaneous case however. Looking at the
simultaneous output of vmstat I'm seeing at least /something/
correlated about twice a minute:

- system calls drops from 200/s to 0-30/s
- interrupts drops from ~700/s to ~500/s
- context switches drops from ~550/s to ~100/s
This suggests (to me) that something in the kernel locks up
and takes a few seconds to complete. My guestimate is that
it has something to do with buffers being flushed to disk (ie
tunable parameters), but I'm not sure.

Also, in the category "page" the sr parameter drops to zero,
which supprises me a bit as that figured is averaged over
5 second intervals...

I think/hope this all boils down (for me) to some tuning of
virtual memory/filing system/buffer related parameters.


cheers, Theo