Subject: Re: Accelerating memset/memcpy
To: None <nigel@mips.com>
From: None <cgd@broadcom.com>
List: port-mips
Date: 10/01/2002 10:01:26
At Tue, 1 Oct 2002 15:51:11 +0000 (UTC), "Nigel Stephens" wrote:
> In MIPS32 and MIPS64 compliant processors the "pref" instruction with 
> code 30 is defined as "prepare for store" with the following description:
> 
>     PrepareForStore
> 
>     Use: Prepare the cache for writing an entire line, without the
>     overhead involved in filling the line from memory.
> 
>     Action: If the reference hits in the cache, no action is taken. If
>     the reference misses in the cache, a line is selected for
>     replacement, any valid and dirty victim is written back to memory,
>     the entire line is filled with zero data, and the state of the line
>     is marked as valid and dirty.
> 
> The other advantage of the pref instruction is that it can be included 
> in user code, whereas the cache instruction is only available to the kernel.

Yup.

Note the MIPS32 and MIPS64 specs also include the following in their
description of the 'pref' opcode, which are inconsistent with
PrepareForStore's description:

* "The action taken for a specific PREF instruction is both system and
  context dependent.  Any action, including doing nothing, is
  permitted as long as it does not change architecturally visible
  state or alter the meaning of a program."

* "A hint value cannot cause an action to modify architecturally
  visible state."

(Zeroing a line of memory is most definitely a modification of
architecturally visible state.  8-) (I mention this because, well,
hey, you're a channel that might be used to get documentation fixes
back in.  Those are from MIPS64 Volume II, rev 0.95, page 243.)


Anyway, despite the pseudo-standardization of the 'hint' fields
("pseudo" because "any action, including doing nothing, is permitted")
because of:

* historical differences from the standardized hints,

* differences in even MIPS32/MIPS64 processors about which are
  implemented and how, and, of course,

* microarchitectural differences,

it really doesn't make sense to try to apply a blanket
'mips32/mips64-optimized' memcpy (et al) to the kernel.  They really
should be selected on a per-cpu basis.



cgd
-- 
Chris Demetriou                                            Broadcom Corporation
Principal Design Engineer                     Broadband Processor Business Unit
  Any opinions expressed in this message are mine, not necessarily Broadcom's.