Subject: Re: ARM9 cache routines updated
To: Hiroyuki Bessho <firstname.lastname@example.org>
From: Richard Earnshaw <email@example.com>
Date: 02/16/2004 16:00:17
> Richard Earnshaw <firstname.lastname@example.org> writes:
> > Another useful test that I sometimes run is to time how long it takes to
> > run the configure script for the 'GNU make' source package.
> I did it. Kernels are same ones I used in the last report.
> 2410-a: backed out both write-back dcache change and clocking-mode
> bits fix in arm9_setup().
> (using sys/arm/include/cpufunc.h:1.29, sys/arm/arm/cpufunc.c:1.65,
> 2410-b: with clocking-mode bits fix in arm9_setup(), and without
> write-back d-cache.
> 2410-c: with write-back d-cache chages, and without clocking-mode
> bits fix.
> 2410-d: both write-back d-cache changes and clocking-mode bits fix.
> 246.77 real 129.77 user 96.28 sys
> 234.45 real 127.70 user 93.97 sys
> 234.01 real 126.54 user 93.70 sys
> 199.95 real 92.56 user 86.82 sys
> 188.92 real 91.76 user 84.83 sys
> 189.17 real 92.93 user 85.16 sys
> 233.45 real 124.55 user 89.14 sys
> 222.25 real 123.29 user 86.29 sys
> 222.24 real 124.37 user 85.27 sys
> 180.17 real 86.79 user 74.54 sys
> 170.78 real 86.71 user 72.11 sys
> 170.51 real 87.95 user 72.30 sys
Hmm, those numbers look pretty reasonable now. I tried the test on my
cats box last night (233MHz strongarm), the timings were (approximately)
63 seconds user space and 80 seconds system space.
Given that the cats has a higher clock frequency and a slightly better CPI
the user-space numbers compare fairly well. On system space the 920 is
clearly a winner, probably due to it's more efficient cache-cleaning code.
> Do you think we'll have more speed-up if we could actually use
It will have a small impact compared with the other changes, IIRC it's
mainly used before starting a DMA read operation to ensure that nothing in
that region will be in cache.