Subject: Re: Accelerating memset/memcpy
To: Simon Burge <simonb@wasabisystems.com>
From: Nicolas BOUQUET <bouquet@ipricot.com>
List: port-mips
Date: 10/01/2002 15:25:22
Simon Burge a écrit:

>On Tue, Oct 01, 2002 at 10:01:39AM +0000, Nicolas BOUQUET wrote:
>  
>
>>So I took my books and found the reason quickly: in my case, memory
>>writes in a particular cacheline are preceded by a cache refill if the
>>line was previously unused. But in my case, these cache refill are not
>>needed since I write entire cachelines (I transfert large blocks of data
>>and measure the time it takes).
>>
>>RM5231's datasheet states that this behaviour can be corrected by
>>issuing a "create dirty exclusive" cache operation on the lines
>>concerned. Doing so effectively increased write throughput: I can write
>>to memory at 125MBytes/s instead of 50MBytes/s.
>>
>>So here comes my question/reflexion: could these modifications be
>>applied to NetBSD kernel, for example through memset/memcpy routines ?
>>    
>>
>
>Indeed, all (or just most?) MIPS32 and MIPS64 CPUs should able to take
>advantage of this too with their PREF instructions, and there a probably
>a number of `older' MIPS IV-style CPUs that have a similar operation
>available.
>
>Ideally we should be able to choose the most optimal mem{cpy,set} and
>pmap_{copy,zero}_page functions at run-time, but compile-time options
>might be a start.
>
>How much of a change to the standard routines did you need for the
>rm5231?
>
For mem{cpy,set}, I atempt to align destination memory to cacheline 
boundary (32 bytes) using standard code, then I fill as much entire 
cachelines as I can without cache prefetch, and finally I leave the 
remaining to standard code. There is just another check to see if we are 
in cached space.

I didn't think about pmap_* routines. In fact this should be easier in 
pmap since page size is generaly a multiple of cacheline size.
Good point, I'll try that.

--
Nicolas BOUQUET
Low-level software and hardware engineer
IPricot SA FRANCE