netbsd-users: Re: explaining TOP memory output

Subject: Re: explaining TOP memory output
To: None <netbsd-users@netbsd.org>
From: Mark Cullen <mark.r.cullen@gmail.com>
List: netbsd-users
Date: 07/15/2006 11:31:47
Michael Parson wrote:
> On Fri, Jul 14, 2006 at 09:21:04AM +0200, Johnny Billquist wrote:
>> Mark Cullen wrote:
>>> Michael Parson wrote:
>>>
>>>> The fact that you have a high-ish load, with an idle CPU is what
>>>> concerns me.  You have something that is causing your load queue length
>>>> to be artificially high.
>>>
>>> Interesting, because my system always hovers around a load average of 
>>> 1.00, with 100% idle CPU. I was just blaming it on the amount of 
>>> processes I had running, but maybe it's not this after all?
>>>
>>> NetBSD 3.0.1, but was the same with 3.0.0
>>>
>>> ---
>>> load averages:  0.94,  0.84,  0.89 08:12:07
>>> 99 processes:  1 runnable, 97 sleeping, 1 on processor
>>> CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% 
>>> idle
>>> Memory: 102M Act, 54M Inact, 1844K Wired, 20M Exec, 67M File, 6132K Free
>>> Swap: 1024M Total, 44M Used, 980M Free
>>> ---
>>>
>>> It's actually quite rare to see it < 1.00. It doesn't seem to really be 
>>> causing any problems, but I am not running X. If it's worth looking in 
>>> to then I can post any info needed?
>> I believe it's related.
> 
> It probably is related.
> 
>> If you have lots of page faults, you'll have processes wanting to run, 
>> but just waiting for pages to become valid. I would guess these 
>> processes counts as active, and thus are included in the load.
> 
> How much physical RAM do you have?  Looking at your top output, I'm
> guessing around 196M?  Does vmstat suggest that you're getting a lot of
> page faults?  Problem with that question is that 'a lot' is relative.
> Sometimes solving the problem can be as simple as throwing more RAM
> at the system, but not always.  Depends on what the problem is, or if
> there even is a problem.  That just might be how your system runs, given
> what's running on it.
> 

256MB in there, but I think a whopping 2MB of it is allocated to the 
onboard video (which I don't use, but can't disable).

Well, is `systat 1 vmstat` good enough? If so, while it's idling I am 
seeing:

---
     3 users    Load  1.08  1.05  1.05                  Fri Jul 14 18:16:59

Proc:r  d  s  w    Csw   Trp   Sys  Int  Sof   Flt            PAGING 
SWAPPING
           17        86    11 13178  339  152    18            in  out 
  in  out
---

18 per second is that? Doesn't seem unusually high to me?!


> The load average is one of the most misunderstood and abused metrics of
> unix systems.  It quite simply is the length of the run queue averaged
> over a time period ( usually 1, 5, and 15 minute).  It is often related
> to, but doesn't always mean, how hard the computer is working.  If you
> fire off a couple of compiles at a time, then your load is going to go
> up, and the CPU numbers will reflect this (%sys, %usr, %idle).  However,
> you can also have a bunch of zombie processes around that are not really
> using any CPU time, but they still sit in the run queue, causing the
> load average to be high, but the CPU will still be (mostly) idle, since
> when their turn comes up in the queue, they just cycle around w/o doing
> anything.

I know :-)

> 
> My box (800 Mhz PIII with 512M of ram and IDE disks) at any given time
> generally has 20-30 interactive shells running (mostly due to myself
> and two other uses using screen), just reading email, irc, etc, plus
> serving DNS for some 93 domains, and serving up virtual domain website
> for 3 or 4 sites, including at least one backed by a mysql database,
> but all of them are very light usage, tends to have a load between .5
> and .6.  Incoming email is fairly light, and I run spamassassin, so if
> I get a flurry of mail, that can cause the load to go up, but I have SA
> configured to only scan two messages at a time, so that limits how high
> the load climbs.
> 

Well, this is a 1GHz Celeron, 256MB of RAM, 4 IDE disks in two RAID-1 
arrays using RAIDFrame and two Intel 100mbit NICS. It's running... all 
sorts, apache, samba, mysql, courier-imap, postfix, named etc etc (a 
sort of do everything box for the home network) but mostly sits 100% 
idle, as I said before :-)

> If my 15 minute load average is >1.00, I know this is out of the
> norm for *my* usual usage patterns, so I start looking into why.  Is
> something CPU bound? I/O bound?  Memory bound?  Spinning and not doing
> anything useful?  Top is just one tool, other helpful ones are vmstat,
> iostat, and ps.  But like any tool, you need to know how to use them and
> how to read their output.
> 

Honestly, I really cant see anything that looks like it might be causing 
a constant ~1.00 load average. It's 100% idle with a 1.00 load average, 
which I would assume means it's not CPU board. The disks are doing next 
to nothing (as far as I can see):

---
(root@bone)/root# iostat -c 10 -w 1 raid0 raid1
       tty            raid1             raid0             CPU
  tin tout  KB/t  t/s  MB/s   KB/t  t/s  MB/s  us ni sy in id
   17   29  7.12    2  0.02  18.41    1  0.03   1  0  1  1 98
    0  184  0.00    0  0.00   0.00    0  0.00   0  0  0  0 100
    0   62  0.00    0  0.00  21.00    2  0.00   0  0  3  1 96
    0   61  0.00    0  0.00   0.00    0  0.00   0  0  0  0 100
   40   71  0.00    0  0.00   0.00    0  0.00   0  0  0  0 100
   72   74  0.00    0  0.00   3.00    4  0.00   0  0  0  0 100
    0   62  0.00    0  0.00   0.00    0  0.00   0  0  0  0 100
    0   62  0.00    0  0.00   0.00    0  0.00   0  0  0  0 100
    0   62  0.00    0  0.00  13.67   12  0.16   0  0  0  0 100
    0   62  4.00    2  0.00   0.00    0  0.00   0  0  0  0 100
---

I don't really know about the memory situation. I was never seeing a 
constant 1.00 load average on FreeBSD, same machine but only a single 
RAID-1 array at that point, and using vinum rather than RAIDFrame.

-- 
Mark Cullen <mark.r.cullen@gmail.com>