Subject: Re: Lock benchmarks
To: None <tech-kern@netbsd.org>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: tech-kern
Date: 09/19/2002 12:24:01
On Tue, Sep 17, 2002 at 10:03:00AM +1200, Gregory McGarry wrote:

 > I'd appreciate receiving the timings on different CPUs.  I guess
 > the results are somewhat academic since there is marginally any
 > difference between the locking schemes.  But numbers are cool.

...for a couple of ARM architecture platforms...

StrongARM SA-110, 233MHz with DC21285 system controller:

Registering restartable atomic sequences
Timing unlock overhead
Timing RAS locks
Timing CPU locks (inlined)
Timing CPU locks (not inlined)
unlock overhead: 1.603476 s (0.016035 us/loop)
RAS: 4.402919 s (0.044029 us/loop)
cpu locks (inlined): 44.463102 s (0.444631 us/loop)
cpu locks (not inlined): 50.253711 s (0.502537 us/loop)

Intel i80321 (XScale core), 400MHz

Registering restartable atomic sequences
Timing unlock overhead
Timing RAS locks
Timing CPU locks (inlined)
Timing CPU locks (not inlined)
unlock overhead: 0.751238 s (0.007512 us/loop)
RAS: 3.756992 s (0.037570 us/loop)
cpu locks (inlined): 2.254331 s (0.022543 us/loop)
cpu locks (not inlined): 7.020705 s (0.070207 us/loop)

...for the XScale case, RAS is paying the penalty of the branch (3 cycles).
On this particular XScale system, the memory controller is built-in to the
CPU, the memory the swp insn is manipulating is cacheable, and so the swp
insn is cheap.  It may well be different on other XScale-based platforms
(e.g. an i80200 + i80312 platform -- I just can't test that easily right now).

I still think RAS is clearly a win, here, because of the obvious benefit on
non-XScale platforms.

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>