netbsd-users: Re: explaining TOP memory output

Subject: Re: explaining TOP memory output
To: None <netbsd-users@NetBSD.org>
From: Michael Parson <mparson@bl.org>
List: netbsd-users
Date: 07/14/2006 10:22:25
On Fri, Jul 14, 2006 at 09:21:04AM +0200, Johnny Billquist wrote:
> Mark Cullen wrote:
>> Michael Parson wrote:
>>
>>> The fact that you have a high-ish load, with an idle CPU is what
>>> concerns me.  You have something that is causing your load queue length
>>> to be artificially high.
>>
>>
>> Interesting, because my system always hovers around a load average of 
>> 1.00, with 100% idle CPU. I was just blaming it on the amount of 
>> processes I had running, but maybe it's not this after all?
>>
>> NetBSD 3.0.1, but was the same with 3.0.0
>>
>> ---
>> load averages:  0.94,  0.84,  0.89 08:12:07
>> 99 processes:  1 runnable, 97 sleeping, 1 on processor
>> CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% 
>> idle
>> Memory: 102M Act, 54M Inact, 1844K Wired, 20M Exec, 67M File, 6132K Free
>> Swap: 1024M Total, 44M Used, 980M Free
>> ---
>>
>> It's actually quite rare to see it < 1.00. It doesn't seem to really be 
>> causing any problems, but I am not running X. If it's worth looking in 
>> to then I can post any info needed?
>
> I believe it's related.

It probably is related.

> If you have lots of page faults, you'll have processes wanting to run, 
> but just waiting for pages to become valid. I would guess these 
> processes counts as active, and thus are included in the load.

How much physical RAM do you have?  Looking at your top output, I'm
guessing around 196M?  Does vmstat suggest that you're getting a lot of
page faults?  Problem with that question is that 'a lot' is relative.
Sometimes solving the problem can be as simple as throwing more RAM
at the system, but not always.  Depends on what the problem is, or if
there even is a problem.  That just might be how your system runs, given
what's running on it.

The load average is one of the most misunderstood and abused metrics of
unix systems.  It quite simply is the length of the run queue averaged
over a time period ( usually 1, 5, and 15 minute).  It is often related
to, but doesn't always mean, how hard the computer is working.  If you
fire off a couple of compiles at a time, then your load is going to go
up, and the CPU numbers will reflect this (%sys, %usr, %idle).  However,
you can also have a bunch of zombie processes around that are not really
using any CPU time, but they still sit in the run queue, causing the
load average to be high, but the CPU will still be (mostly) idle, since
when their turn comes up in the queue, they just cycle around w/o doing
anything.

My box (800 Mhz PIII with 512M of ram and IDE disks) at any given time
generally has 20-30 interactive shells running (mostly due to myself
and two other uses using screen), just reading email, irc, etc, plus
serving DNS for some 93 domains, and serving up virtual domain website
for 3 or 4 sites, including at least one backed by a mysql database,
but all of them are very light usage, tends to have a load between .5
and .6.  Incoming email is fairly light, and I run spamassassin, so if
I get a flurry of mail, that can cause the load to go up, but I have SA
configured to only scan two messages at a time, so that limits how high
the load climbs.

If my 15 minute load average is >1.00, I know this is out of the
norm for *my* usual usage patterns, so I start looking into why.  Is
something CPU bound? I/O bound?  Memory bound?  Spinning and not doing
anything useful?  Top is just one tool, other helpful ones are vmstat,
iostat, and ps.  But like any tool, you need to know how to use them and
how to read their output.

-- 
Michael Parson
mparson@bl.org