Subject: Re: Where's all my CPU time going?
To: Chris Jones <cjones@honors.montana.edu>
From: Jim Reid <jim@mpn.cp.philips.com>
List: netbsd-users
Date: 05/05/1998 12:17:50
>>>>> "Chris" == Chris Jones <cjones@honors.montana.edu> writes:

    Chris> This is probably a fairly basic question, but I find I'm
    Chris> not sure how to answer it:

    Chris> I've got a pmax with many users on it, and I'm seeing
    Chris> upwards of 50% cpu time used in system calls (according to
    Chris> "systat vmstat").  I know I can recompile my kernel with
    Chris> profiling enabled, and that'll tell me which functions are
    Chris> using the CPU; but is there any method short of that for
    Chris> finding out why we're spending so much time in
    Chris> kernel-space?  50% just seems a bit excessive to me...

50% system time may or may not be excessive. It all depends on what
hardware you have and what your users are doing. On a regular
timehsaring system, approx. 50% system time is reasonable as there
will be lots of processes making system calls - doing I/O, creating
new processes, etc, etc.

You can use the PD top program to identify the biggest CPU using
processes on the system. Then use ktrace on those processes to find
out what system calls they're making. Maybe you have an application
that's misbehaving by getting stuck in a tight select() loop or
something like that.

Another thing to check is wayward hardware. Maybe something is
interrupting the CPU more than it should. The "in" field of vmstat
will tell you how many interrupts your CPU is fielding. On my
workstation, this is usually around 120/second. I don't do much work
on it, so most interrupt activity comes from the 100Hz real-time
clock.

Profiling the kernel to find out why you're spending so much CPU time
in system mode is probably overkill. [Usually, it's only kernel
developers who are tuning their code that do this.] The profiling data
probably wouldn't help you. Suppose it tells you that half the kernel
time is spent in namei() - translating pathnames to inodes. How do you
know which pathnames are being translated or which process(es) were
responsible for making those requests? OTOH, the profiling could tell
you about a runaway device driver or excessive use of an interrupt
handles.