port-arm: Re: ARM9 cache routines updated

Subject: Re: ARM9 cache routines updated
To: Hiroyuki Bessho <bsh@grotto.jp>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm
Date: 02/16/2004 16:00:17
> Richard Earnshaw <rearnsha@arm.com> writes:
> 
> >
> > Another useful test that I sometimes run is to time how long it takes to 
> > run the configure script for the 'GNU make' source package.
> >
> 
>   I did it. Kernels are same ones I used in the last report.
> 
>   2410-a: backed out both write-back dcache change and clocking-mode
>           bits fix in arm9_setup().
>           (using sys/arm/include/cpufunc.h:1.29, sys/arm/arm/cpufunc.c:1.65, 
>           sys/arm/arm/cpufunc_asm_arm9.S:1.2)
> 
>   2410-b: with clocking-mode bits fix in arm9_setup(), and without
>           write-back d-cache.
> 
>   2410-c: with write-back d-cache chages, and without clocking-mode
>           bits fix.
> 
> 
>   2410-d: both write-back d-cache changes and clocking-mode bits fix.
> 
> smdk2410-a:
>       246.77 real       129.77 user        96.28 sys
>       234.45 real       127.70 user        93.97 sys
>       234.01 real       126.54 user        93.70 sys
> 
> smdk2410-b:
>       199.95 real        92.56 user        86.82 sys
>       188.92 real        91.76 user        84.83 sys
>       189.17 real        92.93 user        85.16 sys
> 
> smdk2410-c:
>       233.45 real       124.55 user        89.14 sys
>       222.25 real       123.29 user        86.29 sys
>       222.24 real       124.37 user        85.27 sys
> 
> smdk2410-d:
>       180.17 real        86.79 user        74.54 sys
>       170.78 real        86.71 user        72.11 sys
>       170.51 real        87.95 user        72.30 sys
> 

Hmm, those numbers look pretty reasonable now.  I tried the test on my 
cats box last night (233MHz strongarm), the timings were (approximately) 
63 seconds user space and 80 seconds system space.

Given that the cats has a higher clock frequency and a slightly better CPI 
the user-space numbers compare fairly well.  On system space the 920 is 
clearly a winner, probably due to it's more efficient cache-cleaning code.

>   Do you think we'll have more speed-up if we could actually use
> dcache_inv_range?

It will have a small impact compared with the other changes, IIRC it's 
mainly used before starting a DMA read operation to ensure that nothing in 
that region will be in cache.

R.