netbsd-users: Re: getting cpu utilization

Subject: Re: getting cpu utilization
To: None <netbsd-users@netbsd.org>
From: George Georgalis <george@galis.org>
List: netbsd-users
Date: 01/14/2007 20:15:13
On Fri, Jan 12, 2007 at 11:19:42AM +0100, Johnny Billquist wrote:
>And to correct myself.. :)
>
>Johnny Billquist wrote:
>>George Georgalis wrote:
>>
>>>On Thu, Jan 11, 2007 at 06:46:30PM -0600, Jeremy C. Reed wrote:
>>>
>>>>>I'm using a /bin/sh function to generate a the cpu utilization
>>>>>util () { # CPU Utilization
>>>>>idle=$(echo "2 k $(top -b -d2 | grep '^CPU states' | awk '{print 
>>>>>$11}' | sed 's/%//') 1 + p" | dc)
>>>>>echo "2 k 1 $idle / p"  | dc ;}
>>>>>
>>>>>That returns the inverse of the cpu idle % found in top.  I add
>>>>>1 to the value before I invert it to prevent divide by zero, so
>>>>>output is pretty much between 0 and 1.
>>>>>
>>>>>Running two top reports seems a pretty inefficient way to get the
>>>>>value.  I think I can tune top a bit, but is there a more direct
>>>>>way to get the measurement?
>>>>
>>>>
>>>>Maybe "sysctl kern.cp_time" ?
>>>
>>>
>>>hey that looks pretty good, but I cannot find any doc on it
>>>sysctl(8) mentions it; but no detail in (3)
>>>my best guess to the numbers is the number of 0.01 seconds elapsed
>>>per cpu, so idle athlon below increments idle time silghtly over
>>>100 per second, while the idle 8 core opteron increments at a
>>>little more than 800 per second.
>>>
>>>does that sound right?
>>
>>
>>Yes, but slightly wrong. Your one-cpu system will increment the counter 
>>according to the hz field in kern.clockrate. For your machine, that 
>>would be 100. The "slightly over" you are observing is because your 
>>sleep, followed by a command will not take exactly one second, but 
>>slightly more.
>>And your 8 cpu system also have a hz of 100, but since you have eight 
>>cpus, you'll have to multiply by that, which gives 100.
>
>Um... 800. :)
>
>>So the rate per second is (no_cpu * hz)
>>And to figure out the amount spent in different modes, just take the 
>>delta from the last sample of the specific, divided with the total delta 
>>of all the fields.
>
>...last sample of the specific field, ...

Thanks all for the pointers.  I'm working on another
issue, but wanted to forward my progress here.

Since rrdtool manages time (ie you can give it 1 or
200 data points per minute and the appropriate values
go into the archive according at the configured
interval) and it has the DERIVE data store (ie
records the slope of a data source not the absolute
value); I wanted to use that to translate various
tick counts to utilization graphs. Verses recording
the old values and comparing with the new values
and accounting for the change of time, to create a
ratio.

Here's how I generate the rrd

 gen) # hint: sh $0 gen | sh
  echo rrd=$rrd
  step=60       # constant
  width=600     # constant

  echo rrdtool create \$rrd --step 60 DS:load:GAUGE:180:0:U \\
  for RRA_sec in $rri; do
   rows=$(( $width / 3 + 1 ))
   steps=$(( ( $RRA_sec / $step + 1 ) / $rows + 1 ))
   avsteps=$(( 8 * $steps ))
   echo RRA:MAX:0.5:$steps:$rows RRA:MIN:0.5:$steps:$rows RRA:AVERAGE:0.5:$avsteps:$rows \\
  done
  echo
  echo

  for RRAt in user nice sys intr idle; do
   echo rrdt=${rr}-${RRAt}.rrd
   echo rrdtool create \$rrdt --step 60 DS:${RRAt}:DERIVE:180:U:U \\
   for RRA_sec in $rri; do
    rows=$(( $width / 3 + 1 ))
    steps=$(( ( $RRA_sec / $step + 1 ) / $rows + 1 ))
    avsteps=$(( 2 * $steps ))
    #echo RRA:LAST:0.5:$steps:$rows RRA:MAX:0.5:$steps:$rows RRA:MIN:0.5:$steps:$rows RRA:AVERAGE:0.5:$avsteps:$rows \\
    echo RRA:AVERAGE:0.5:$avsteps:$rows \\
   done
   echo
  done
  echo
 ;;


Here is where I'm experimenting, I'd like to see a
output range of 0 to 1 so I can overlay with load
data, a logarithmic graph that already works good. If
the utilization ever reaches 100% the load will be
over 1 so the data sources won't overlap.

write_load () {
 rrdtool update ${rr}-user.rrd $1:$(echo "2 k $2 100 / p" | dc)
 rrdtool update ${rr}-nice.rrd $1:$3
 rrdtool update ${rr}-sys.rrd $1:$4
 rrdtool update ${rr}-intr.rrd $1:$5
 rrdtool update ${rr}-idle.rrd $1:$6 ;}

load () { # CPU Load
 uptime | sed -e 's/.*averages://' -e 's/,//' | awk '{print $1}' ;}

Here's how I update the data store

 db)
  N=$(date +%s)
  rrdtool update $rrd $N:$(load)
  write_load $(echo $N ; /sbin/sysctl kern.cp_time | awk '{print $4,$7,$10,$13,$16}' | sed 's/,//g')
 ;;

Seems like it should work, but maybe not. eg does
DERIVE record the slope (active tick counters) or
change in slope (useless change in utilization).

Not sure, but I've got other technical problems atm.

BTW, this is how I generate the graphs.

graph)
  hostname=$(/bin/hostname)
  load=$(load)
  util=zzz # $(util)
  #util=$(util2)
  echo '<META HTTP-EQUIV="Refresh" CONTENT="10">' >${rr}.html
  for n in 3600 7200 $rri ; do
   rrdtool graph load-${n}.png \
        --logarithmic \
        --slope-mode \
        --lower-limit 0.09 --rigid \
        --watermark "seconds=$n hours=$(($n / 3600 )) days=$(($n / 86400)) weeks=$(($n / 604800)) months=$(($n / 2592000))" \
        --vertical-label 'Red line @ 5' \
        --title "$hostname Load ($load) and Utilization ($util)" \
        --end now --start end-${n}s --width 600 \
        \
        DEF:ave=$rrd:load:AVERAGE \
        DEF:min=$rrd:load:MIN \
        DEF:max=$rrd:load:MAX \
        \
        DEF:user=${rr}-user.rrd:user:AVERAGE \
        DEF:nice=${rr}-nice.rrd:nice:AVERAGE \
        DEF:sys=${rr}-sys.rrd:sys:AVERAGE \
        DEF:intr=${rr}-intr.rrd:intr:AVERAGE \
        DEF:idle=${rr}-idle.rrd:idle:AVERAGE \
        \
        AREA:min\#f2f2ff:"min" \
        LINE:max\#cc2222:"max" \
        LINE2:ave\#cccc22:"ave" \
        \
        AREA:user\#0f4499:"user" \
        AREA:nice\#666666:"nice":STACK \
        AREA:sys\#8877ee:"sys":STACK \
        AREA:intr\#993399:"intr":STACK \
        \
        HRULE:1\#440000 \
        HRULE:5\#ff00ff \
        >/dev/null
        #AREA:loadi\#ff3311:"cpu" \
   echo "<br><img src=load-$n.png>" >> ${rr}.html
  done
 ;;

and these are the intervals I'm using for the archives...

#    6 hrs, 18 hrs, 36 hrs, 3 days, 7 days, 6 weeks, 8 months, 18 months, 60 months
rri="21600  64800   129600  259200  604800  3628800  20995200  47304000   157680000"



ciao,
// George


-- 
George Georgalis, systems architect, administrator <IXOYE><