Subject: More general support for hardware performance monitoring
To: None <tech-kern@netbsd.org>
From: Allen Briggs <briggs@wasabisystems.com>
List: tech-kern
Date: 07/24/2002 13:49:19
--nHwqXXcoX0o6fKCv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

I've been working on a way to have somewhat more general support
for hardware performance monitoring counters (PMCs), and have
prototyped my work on the Intel XScale platform.

If you're not familiar with PMCs, you can find some more information
in a number of places, but a reasonable introduction is available in
Intel's application note at
	http://www.intel.com/design/IIO/applnots/273661.htm

Different CPUs have different PMC capabilities.  Some can interrupt
the CPU on a counter overflow, and some can't.  Some can count
regardless of privilege mode, and some can count supervisor vs.
user separately.  One of my changes is to allow the PMC overflow
interrupt on the XScale to be a profiling source (instead of the
clock).

I've attached the man page for the kernel interface and the two
system calls.  Diffs of my current changes are available from
	http://www.ninthwonder.com/~briggs/pmc.diffs

HTML of the man pages are also available as:
	http://www.ninthwonder.com/~briggs/pmc.9.html
and	http://www.ninthwonder.com/~briggs/pmc_control.2.html

I haven't yet updated the ia32/athlon code (any volunteers with
better ia32 knowledge?).  I'd also like to add support for some
of the PowerPC CPUs with PMC capabilities (everything from Motorola
newer than the 604e or so) when the API is nailed down.

So...  Comments?  Suggestions?  Review?

-allen

-- 
 Allen Briggs                     briggs@wasabisystems.com
 http://www.wasabisystems.com/    Quality NetBSD CDs, Sales, Support, Service
NetBSD development for Alpha, ARM, M68K, MIPS, PowerPC, SuperH, XScale, etc...

--nHwqXXcoX0o6fKCv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="pmc.9"

.\\" $NetBSD$
.\\"
.\\" Copyright (c) 2002 Wasabi Systems, Inc.
.\\" All rights reserved.
.\\"
.\\" Written by Allen Briggs for Wasabi Systems, Inc.
.\\"
.\\" Redistribution and use in source and binary forms, with or without
.\\" modification, are permitted provided that the following conditions
.\\" are met:
.\\" 1. Redistributions of source code must retain the above copyright
.\\"    notice, this list of conditions and the following disclaimer.
.\\" 2. Redistributions in binary form must reproduce the above copyright
.\\"    notice, this list of conditions and the following disclaimer in the
.\\"    documentation and/or other materials provided with the distribution.
.\\" 3. All advertising materials mentioning features or use of this software
.\\"    must display the following acknowledgement:
.\\"      This product includes software developed for the NetBSD Project by
.\\"      Wasabi Systems, Inc.
.\\" 4. The name of Wasabi Systems, Inc. may not be used to endorse
.\\"    or promote products derived from this software without specific prior
.\\"    written permission.
.\\"
.\\" THIS SOFTWARE IS PROVIDED BY WASABI SYSTEMS, INC. ``AS IS'' AND
.\\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL WASABI SYSTEMS, INC
.\\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\\" POSSIBILITY OF SUCH DAMAGE.
.\\"
.Dd July 22, 2002
.Dt PMC 9
.Os
.Sh NAME
.Nm pmc ,
.Nm pmc_get_num_counters ,
.Nm pmc_get_counter_type ,
.Nm pmc_save_context ,
.Nm pmc_restore_context ,
.Nm pmc_enable_counter ,
.Nm pmc_disable_counter ,
.Nm pmc_counter_isrunning ,
.Nm pmc_counter_isconfigured ,
.Nm pmc_configure_counter ,
.Nm pmc_get_counter_value ,
.Nm pmc_accumulate ,
.Nm pmc_alloc_kernel_counter ,
.Nm pmc_free_kernel_counter ,
.Nm pmc_start_profiling ,
.Nm pmc_stop_profiling ,
.Nm PMC_ENABLED
.Nd Hardware Performance Monitoring Interface
.Sh SYNOPSIS
.Fd #include \*[Lt]sys/pmc.h\*[Gt]
.Ft int
.Fn pmc_get_num_counters "void"
.Ft int
.Fn pmc_get_counter_type "int ctr"
.Ft void
.Fn pmc_save_context "struct proc *p"
.Ft void
.Fn pmc_restore_context "struct proc *p"
.Ft int
.Fn pmc_enable_counter "struct proc *p" "int ctr"
.Ft int
.Fn pmc_disable_counter "struct proc *p" "int ctr"
.Ft int
.Fn pmc_counter_isrunning "struct proc *p" "int ctr"
.Ft int
.Fn pmc_counter_isconfigured "struct proc *p" "int ctr"
.Ft int
.Fn pmc_configure_counter "struct proc *p" "int ctr" \
"struct pmc_counter_cfg *cfg"
.Ft int
.Fn pmc_get_counter_value "struct proc *p" "int ctr" "int flags" \
"uint64_t *pval"
.Ft int
.Fn pmc_accumulate "struct proc *p_parent" "struct proc *p_exiting"
.Ft int
.Fn pmc_alloc_kernel_counter "int ctr" "struct pmc_counter_cfg *cfg"
.Ft int
.Fn pmc_free_kernel_counter "int ctr"
.Ft int
.Fn pmc_start_profiling "int ctr" "struct pmc_counter_cfg *cfg"
.Ft int
.Fn pmc_stop_profiling "int ctr"
.Ft int
.Fn PMC_ENABLED "struct proc *p"
.Sh DESCRIPTION
Provides a machine-independent interface to the hardware performance counters
which are available on several CPU families.  The capabilities of these
counters vary from CPU to CPU, but they basically count hardware events
such as data cache hits or misses, branches taken, branched mispredicted,
and so forth.  Some can interrupt the processor when a certain threshold
has been reached.  Some can count events in user space and kernel space
independently.
.Pp
The
.Nm
interface is intended to allow monitoring from within the kernel as well
as monitoring of userland applications.  If the hardware can interrupt the
CPU in a specific implementation, then it may also be used as a profiling
source instead of the clock.
.Sh NOTES
All function calls in this interface may be defined as
.Xr cpp 1
macros.
If any function is not implemented as a macro, its prototype must be
defined by the port-specific header
.Pa Aq machine/pmc.h .
.Pp
Counters are numbered from 0..N-1 where N is the number of counters
available on the system.
.Pp
Upon a process fork, implementations must
.Bl -bullet
.It
Zero performance counters for the new process, and
.It
Inherit any enabled performance counters.
.El
.Sh DATA TYPES
Each implementation must specify two new types:
.Bl -tag -width pmc_evid_t
.It Fa pmc_evid_t
An integer type which can contain the event IDs for a given processor.
.It Fa pmc_ctr_t
An integer type defining the value which may be contained in a given
counter register.
.El
.Pp
Counters are configured with the
.Fa struct pmc_counter_cfg .
This structure is defined as
.Bd -literal
struct pmc_counter_cfg {
	pmc_evid_t	event_id;
	pmc_ctr_t	reset_value;
	uint32_t	flags;
};
.Ed
.sp
.Fa flags
are currently unused.
.Sh FUNCTIONS
.Bl -tag -width width -compact
.It Fn pmc_get_num_counters "void"
Returns the number of counters present on the current system.  Valid values for
.Fa ctr
in the interface entry points below are from zero to one less than the
return value from this function.
.sp
.It Fn pmc_get_counter_type "int ctr"
Returns an implementation-dependent type describing the specified counter.
.sp
.It Fn pmc_save_context "struct proc *p"
Saves the PMC context for the current process.  This is called just before
.Xr cpu_switch 9 .
If there is kernel PMC state, it must be maintained across this call.
.sp
.It Fn pmc_restore_context "struct proc *p"
Restores the PMC context for the current process.  This is called just
after
.Xr cpu_switch 9
returns.  If there is kernel PMC state, it must be maintained across
this call.
.sp
.It Fn pmc_enable_counter "struct proc *p" "int ctr"
Enables counter
.Fa ctr
for the specified process.  The counter should have already been configured
with a call to
.Fn pmc_configure_counter .
This starts the counter running if it is not already started and enables
any interrupts, as appropriate.
.sp
.It Fn pmc_disable_counter "struct proc *p" "int ctr"
Disables counter
.Fa ctr
for the specified process.  This stops the counter from running, and
disables any interrupts, as appropriate.
.sp
.It Fn pmc_counter_isrunning "struct proc *p" "int ctr"
Returns non-zero if the specified counter in the specified process is
running or if the counter is running in the kernel.
.sp
.It Fn pmc_counter_isconfigured "struct proc *p" "int ctr"
Returns non-zero if the specified counter in the specified process is
configured or if the counter is in use by the kernel.
.sp
.It Fn pmc_configure_counter "struct proc *p" "int ctr" \
"struct pmc_counter_cfg *cfg"
Configures counter
.Fa ctr
according to the configuration information stored in
.Fa cfg .
.sp
.It Fn pmc_get_counter_value "struct proc *p" "int ctr" "int flags" \
"uint64_t *pval"
Returns the value of counter
.Fa ctr
in the space pointed to by
.Fa pval .
The only recognized flag is
.Fa PMC_VALUE_FLAGS_CHILDREN
which specifies that the returned counts should be accumulated values
for any exited child processes.
.sp
.It Fn pmc_accumulate "struct proc *p_parent" "struct proc *p_exiting"
Accumulates any counter data from the exiting process
.Fa p_exiting
into the counters for the parent process
.Fa p_parent .
.sp
.It Fn pmc_alloc_kernel_counter "int ctr" "struct pmc_counter_cfg *cfg"
Allocates counter
.Fa ctr
for use by the kernel and configures it with
.Fa cfg .
.sp
.It Fn pmc_free_kernel_counter "int ctr"
Returns counter
.Fa ctr
to the available pool of counters that may be used by processes.
.sp
.It Fn pmc_start_profiling "int ctr" "struct pmc_counter_cfg *cfg"
Allocates counter
.Fa ctr
for use by the kernel for profiling and configures it with
.Fa cfg .
.sp
.It Fn pmc_stop_profiling "int ctr"
Stops profiling with counter
.Fa ctr .
.sp
.It Fn PMC_ENABLED "struct proc *p"
Returns non-zero if the given process or the kernel is using the PMC at all.
.El
.Sh SEE ALSO
.Xr pmc 1 ,
.Xr pmc_control 2 ,
.Xr pmc_get_info 2
.Sh HISTORY
The
.Nm
interface appeared in
.Nx 2.0 .
.Sh AUTHORS
The
.Nm
interface was designed and implemented by Allen Briggs for Wasabi Systems, Inc.
Additional input on the
.Nm
design was provided by Jason R. Thorpe.

--nHwqXXcoX0o6fKCv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="pmc_control.2"

.\\" $NetBSD$
.\\"
.\\" Copyright (c) 2002 Wasabi Systems, Inc.
.\\" All rights reserved.
.\\"
.\\" Written by Allen Briggs for Wasabi Systems, Inc.
.\\"
.\\" Redistribution and use in source and binary forms, with or without
.\\" modification, are permitted provided that the following conditions
.\\" are met:
.\\" 1. Redistributions of source code must retain the above copyright
.\\"    notice, this list of conditions and the following disclaimer.
.\\" 2. Redistributions in binary form must reproduce the above copyright
.\\"    notice, this list of conditions and the following disclaimer in the
.\\"    documentation and/or other materials provided with the distribution.
.\\" 3. All advertising materials mentioning features or use of this software
.\\"    must display the following acknowledgement:
.\\"      This product includes software developed for the NetBSD Project by
.\\"      Wasabi Systems, Inc.
.\\" 4. The name of Wasabi Systems, Inc. may not be used to endorse
.\\"    or promote products derived from this software without specific prior
.\\"    written permission.
.\\"
.\\" THIS SOFTWARE IS PROVIDED BY WASABI SYSTEMS, INC. ``AS IS'' AND
.\\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL WASABI SYSTEMS, INC
.\\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\\" POSSIBILITY OF SUCH DAMAGE.
.\\"
.Dd July 23, 2002
.Dt PMC_CONTROL 2
.Os
.Sh NAME
.Nm pmc_control ,
.Nm pmc_get_info
.Nd Hardware Performance Monitoring Interface
.Sh LIBRARY
.Lb libc
.Sh SYNOPSIS
.Fd #include \*[Lt]sys/pmc.h\*[Gt]
.Ft int
.Fn pmc_control "int ctr" "int op" "void *argp"
.Ft int
.Fn pmc_get_info "int ctr" "int op" "void *argp"
.Sh DESCRIPTION
.Fn pmc_get_info
returns the number of counters in the system or information on a specified
counter
.Fa ctr .
The possible values for
.Fa op
are:
.Bl -tag -width width
.It PMC_INFO_NCOUNTERS
When querying the number of counters in the system,
.Fa ctr
is ignored and
.Fa argp
is of type
.Em int * .
Upon return, the integer pointed to by
.Fa argp
will contain the number of counters that are available in the system.
.It PMC_INFO_COUNTER_VALUE
When querying the value of a counter in the system,
.Fa ctr
refers to the counter being queried, and
.Fa argp
is of type
.Em uint64_t * .
Upon return, the 64-bit integer pointed to by
.Fa argp
will contain the value of the specified counter.
.It PMC_INFO_ACCUMULATED_COUNTER_VALUE
When querying the value of a counter in the system,
.Fa ctr
refers to the counter being queried, and
.Fa argp
is of type
.Em uint64_t * .
Upon return, the 64-bit integer pointed to by
.Fa argp
will contain the sum of the accumulated values of specified counter in
all exited subprocesses of the current process.
.El
.Pp
.Fn pmc_control
manipulates the specified counter
.Fa ctr
in one of several fashions.  The
.Fa op
parameter determines the action taken by the kernel and also the interpretation of the
.Fa argp
parameter.  The possible values for
.Fa op
are:
.Bl -tag -width width
.It PMC_OP_START
Starts the specified
.Fa ctr
running.  It must be preceded by a call with
.Em PMC_OP_CONFIGURE .
.Fa argp
is ignored in this case and may be NULL.
.It PMC_OP_STOP
Stops the specified
.Fa ctr
from running.
.Fa argp
is ignored in this case and may be NULL.
.It PMC_OP_CONFIGURE
Configures the specified
.Fa ctr
prior to running.
.Fa argp
is a pointer to a
.Em struct pmc_counter_cfg .
.Bd -literal
	struct pmc_counter_cfg {
		pmc_evid_t	event_id;
		pmc_ctr_t	reset_value;
		uint32_t	flags;
	};
.Ed
.Bl -tag -width width
.It Dv event_id
is the event ID to be counted.
.It Dv reset_value
is a value to which the counter should be reset on overflow (if supported
by the implementation).  This is most useful when profiling (see
.Em PMC_OP_PROFSTART ,
below).  This value is defined to be the number of counter ticks before
the next overflow.  So, to get a profiling tick on every hundredth data
cache miss, set the
.Dv event_id
to the proper value for
.Dq dcache-miss
and set
.Dv reset_value
to 100.
.It Dv flags
Currently unused.
.El
.It PMC_OP_PROFSTART
Configures the specified
.Fa ctr
for use in profiling.
.Fa argp
is a pointer to a
.Em struct pmc_counter_cfg
as in
.Em PMC_OP_CONFIGURE ,
above.  This request allocates a kernel counter, which will fail if any
process is using the requested counter.
Not all implementations or counters may support this option.
.It PMC_OP_PROFSTOP
Stops the specified
.Fa ctr
from being used for profiling.
.Fa argp
is ignored in this case and may be NULL.
.El
.Sh RETURN VALUES
A return value of 0 indicates that the call succeeded.  Otherwise, -1 is
returned and the global variable
.Va errno
is set to indicate the error.
.Sh ERRORS
Among the possible error codes from
.Fn pmc_control
and
.Fn pmc_get_info
are
.Bl -tag -width Er
.It Bq Er EFAULT
The address specified for the
.Fa argp
is invalid.
.It Bq Er ENXIO
Specified counter is not yet configured.
.It Bq Er EINPROGRESS
.Dv PMC_OP_START
was passed for a counter that is already running.
.It Bq Er EINVAL
Specified counter was invalid.
.It Bq Er EBUSY
If the requested counter is already in use--either by the current process
or by the kernel.
.It Bq Er ENODEV
If and only if the specified counter event is not valid for the specified
counter when configuring a counter or starting profiling.
.It Bq Er ENOMEM
If the kernel is unable to allocate memory.
.El
.Sh SEE ALSO
.Xr pmc 1 ,
.Xr pmc 9
.Sh HISTORY
The
.Fn pmc_control
and
.Fn pmc_get_info
system calls appeared in
.Nx 2.0 .

--nHwqXXcoX0o6fKCv--