Subject: Re: high load average
To: None <garph@661.org>
From: Brian Stark <bstark@siemens-psc.com>
List: current-users
Date: 08/25/2000 18:20:39
On Fri, 25 Aug 2000 garph@661.org wrote:

> 
> This machine is an 800MHz Athlon with 128M ram, a 20G UDMA66 hard drive,
> and a Netgear FA310TX nic.
> 
> In multiuser mode with only root logged in running top, load average
> hovered around 0.07 to 0.13.  Unpriviliged user logs in (remote or on
> console didn't matter) and the load immediately shot to around 0.25 to
> 0.40.  User sat idle and load went back to 0.13 or so.  User started

"immediately shot to around 0.25 to 0.40" ???

If I remember correctly, a load average is the number of runnable
processes in the operating system's run queue over a specified period
of time. For NetBSD, the load averages are for the last 1, 5, and 15
minutes.

Based on the averages you are reporting, I'd say not much is going on in
your system. 

> load went to 1.40 and beyond.  All this time, the machine had the kind of
> sluggish respone one would get from a load average that high.

How far beyond?

You might be better off using a combination of tools to monitor your
situation. If you think the system is sluggish, then there may be a
performance problem of some type (i/o, network, memory, etc...) that may
not show itself in the load average numbers.

For starters, I'd suggest running 'top' in one window with 'systat -w 10
vmstat' in another window. Oh, if your machine is connected to a network,
disconnect that while you are doing your tests. That way you will at least
know that nothing coming in from the network is causing trouble.

> 
> USER  PID %CPU %MEM VSZ   RSS TT STAT STARTED    TIME COMMAND
> root    0  0.0  8.2   0 10672 ?? DLs   3:34PM 0:00.00 (swapper)
> garph 258  0.0  0.2 448   212 E1 Is+   3:38PM 0:00.02 -sh 
> root  246  0.0  0.2 428   280 E0 Ss    3:38PM 0:00.03 -csh 
> root  195  0.0  0.3  48   432 E3 Is+   3:34PM 0:00.00 /usr/libexec/getty Pc ttyE
> root  194  0.0  0.3  48   432 E2 Is+   3:34PM 0:00.00 /usr/libexec/getty Pc ttyE
> root  190  0.0  0.3 220   412 ?? Is    3:34PM 0:00.00 /usr/sbin/cron 
> root  187  0.0  0.3  56   356 ?? Is    3:34PM 0:00.00 /usr/sbin/inetd -l 
> root  111  0.0  0.3 100   400 ?? Ss    3:34PM 0:00.05 /usr/sbin/syslogd -s 
> root    4  0.0  8.2   0 10672 ?? DL    3:34PM 0:00.02 (ioflush)
> root    3  0.0  8.2   0 10672 ?? DL    3:34PM 0:00.01 (reaper)
> root    2  0.0  8.2   0 10672 ?? DL    3:34PM 0:00.00 (pagedaemon)
> root    1  0.0  0.2 308   188 ?? Is    3:34PM 0:00.01 init 
> root  260  0.0  0.1 340   144 E0 R+    3:39PM 0:00.00 ps aux 


Your CPU utilitilization numbers are all 0.0. This system looks fairly
idle...

Looking at the 'STAT' column, the first letter indicates the run state of
the process:

  D       Marks a process in disk (or other short term, uninter-
          ruptible) wait.
  I       Marks a process that is idle (sleeping for longer than
          about 20 seconds).
  R       Marks a runnable process.
  S       Marks a process that is sleeping for less than about 20
          seconds.
  T       Marks a stopped process.
  Z       Marks a dead process (a ``zombie'').

More information on the other letters for the 'STAT' column are available
on the man page for 'ps'.

Your output above shows only one process in the 'R'un state, and that is
your copy of 'ps aux'.


Brian
bstark@siemens-psc.com