tech-kern: vmstat - is it reliable? Some think not... (fwd)

Subject: vmstat - is it reliable? Some think not... (fwd)
To: None <tech-kern@netbsd.org>
From: Jon Lindgren <jlindgren@espus.com>
List: tech-kern
Date: 07/27/2000 16:41:43
An odd subject, I'd agree.  But let me explain:

I've been benchmarking a few different machines for use as a high level
protocol router.  Basically, the machine picks up broadcasts, potentially
filters the data, then pipes it (read: tunnel) to another machine which
rebroadcasts the data.  This is done to 1) prevent use of directed
broadcasts and/or forwarding broadcasts (a Bad Thing ;-) and 2) the
machine won't forward messages which it doesn't need to (a kind of
forward-on-demand setup).

Benchmarking on a particular type of machine revealed this: when the
machine receives large amounts of UDP packets on a port it's not listening
on, the CPU goes up to about %80 utilization.  This figure was reported by
vmstat.

Odd.  %5-10 I could expect.  Not %80.  These aren't old machines - they're
PCI based busses with relatively powerful RISC cpus, 256MB memory.

Upon _extremely_ close examination, the resolution group revealed that the
CPU is in fact not at %80 - the error lies in how 'vmstat' is reporting
it.

It seems that the "BSD" way of doing maintaining usage counters is flawed,
according to the resolution group.  The specifics of this have to do with
the timer interrupt:

- Every 10ms the timer interrupt occurs.  It examines where the system was
when it was interrupted (i.e. kernel, user, etc...).  It updates the
proper counter (which vmstat uses to display utilization), and moves on.

- If a higher priority interrupt is in progress, the timer int is
masked.  When the timer interrupt is finally fired, it realizes that it
was late, and chalks the time up to kernel (since a higher priority
interrupt was the only thing which could prevent the timer from
firing... it couldn't be userland code)

Now, if the a higher level interrupt _consistently_ happens right before
the timer (say 1ms before), the timer assumes the past 10ms have been in
the kernel.  Hence it over-reports CPU usage.

The resolution group claims that this is standard behavior between
virtually all BSD derived OS's.  I.e. NetBSD, Solaris, AIX, etc...

This explanation sounds a bit wrong to me.  Besides not "feeling" right, I
am not able to reproduce this behavior on other types of machines.  It
also seems to be a _huge_ coindicence that I can reliably reproduce this
by flooding the particular type of machine with UDP broadcasts.

Is the vmstat report true?  And if so, does this affect NetBSD?  Could
good ol' vmstat actually be a relatively worthless tool under some
circumstances?  I certainly hope not.

I guess I'm looking for input as to how NetBSD does vmstat.  Would NetBSD
suffer from the same problems?

Or should I replace all of my forwarding machines with NetBSD?

(hey - I can always dream)

-Jon
 --------------------------------------------------------------------
 "Drink Shazz Cola!"