Subject: Re: explaining TOP memory output and constant 1.0 load averages
To: Mark Cullen <mark.r.cullen@gmail.com>
From: Greg A. Woods <woods@weird.com>
List: netbsd-users
Date: 07/17/2006 12:10:07
--pgp-sign-Multipart_Mon_Jul_17_12:10:04_2006-1
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

At Mon, 17 Jul 2006 11:57:32 +0100,
Mark Cullen wrote:
>=20
> Perhaps changing from 50MB to 5MB helped as it was filling up the file
> system buffer cache or something, as Johnny suggested. If the buffer
> cache code pushes things out to swap I guess it could cause things like
> this to happen?

Keep in mind (and if I understand correctly), the buffer cache won't
page anything out onto swap unless it's willing to go, and it also won't
do it unless you've got I/O demands to fill the buffer cache.

I.e. this machine is too small/slow for what's being attempted with it.
NetBSD has been asked to to make compromises and it's doing the very
best it can with the information it has been given.  Give it more memory!

As for the constant load average of 1.0, well that's just an average and
it's just a hint that there's something almost always ready to run
that's not yet runnable for some reason.  It could be waiting on regular
I/O or, (again if I understand correctly) it could be waiting on virtual
memory.  Any use of swap of course suggests that memory should be added,
but again as has been suggested the key to knowing whether memory
exhaustion is the problem is to watch for pagein/pageout requests (and
swap-ins/outs) with vmstat.  I like to use "vmstat -s -w 1" to watch the
numbers roll.  You need a rather tall (77-line) window to see everything
and you have to have an eye for big numbers and magnitudes of changes to
really see what's happening (it would be nice to have simple tools to
provide real-time graphs from such data!).  If there's no paging (or
swapping) activity and the load average is still nearly 1.0 then the
problem is more likely to be file I/O, and maybe _increasing_ the buffer
cache will help.

Unfortunately *BSD still doesn't have the very best production tools for
finding out exactly what is waiting on what (though I don't remember any
more what I would use on more production-oriented systems, though sar
and system accounting are on the horizon).  I.e. there are no good basic
tools to see what's triggering those vmstat numbers to roll, though
"top" isn't entirely useless, especially with '-I' (and "systat ps" is
nearly as good).

Personally I don't think my servers are doing enough work unless their
load average is somwhere between 1 and 2 times their number of CPUs.  I
always make sure they have more than enough RAM and since disks are
always slow I always want at least one process ready to run just as soon
as the I/O it's waiting on is completed.  Now a workstation on the other
hand is something I normally don't want too much load on because I want
X11 (and any other little beasties such as xterms and clocks and the
window manager) to always be able to run when they need to.  However if
the WS is being used to present multimedia stuff then I don't mind if
it's working full tilt since that application probably has my full
attention and I won't be asking the machine to do very much else (at
least nothing that demands quick UI response times).  There's another
hint in there too -- split your processing between workstations and
servers!  Even running emacs on the server is a good thing for snappy
response times (unless you have slow font load times and you use a lot
of different fonts :-)).

--=20
						Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>       Secrets of the Weird <woods@weird.com>

--pgp-sign-Multipart_Mon_Jul_17_12:10:04_2006-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: MBDW1JtaavRwHLXT1S9eZZdNCpLiLoyV

iQA/AwUBRLu2XmJ7XxTCWceFEQIZjACfcxMuIONJrhIRMv2ZGiPw2WPJ1/0AoIRg
CKgC0BAf7oUyunGFHJ6WyeuE
=0D0p
-----END PGP SIGNATURE-----

--pgp-sign-Multipart_Mon_Jul_17_12:10:04_2006-1--