Subject: Re: Restartable Atomic Sequences
To: Gregory McGarry <g.mcgarry@ieee.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-kern
Date: 08/27/2002 12:42:18
On Thu, Jul 04, 2002 at 01:06:35PM +1200, Gregory McGarry wrote:

> Since I don't have an R4000+ cpu to compare the performance, I tested
> on a DX4 (i386).  I compared three approaches:
> 
> 1) kernel acquire and release systems calls using sysarch(2)
> 
> 2) RAS using explicit registration with a system call using sysarch(2)
> 
> 3) __cpu_simple_lock() from /usr/include/machine/lock.h.  This
>    function is inlined.
> 
> Here are the run times (in seconds) for 1E6 iterations of locking and
> unlocking a lock using the three approaches:
> 
> 	1) 21.570237
> 	2) 1.046555
> 	3) 0.946119
> 
> Nothing really surprising here I suppose.  Don't use system calls.
> Actually, I did trim an instruction off the lock code which improves
> the RAS result here.  You can see the improvement below.

Here's results for a couple of MIPS cpus:

bcm1250 (mips64 500MHz)

        syscall lock:		time= 7.174320
        RAS lock:		time= 0.220124
        test-and-set lock:	time= 0.450196

au1000 (mips32 396MHz)
 
        syscall lock:		time=12.292677
        RAS lock:		time= 0.227705
        test-and-set lock:	time= 0.531112
 
r4400 (mips3 120MHz)
 
        syscall lock:		time=48.880966
        RAS lock:		time= 1.048473
        test-and-set lock:	time= 1.090512

So RAS is faster in all cases.  I'm thinking of something like:

	if sysctl(hw.ncpus == 1) {
		RAS lock
	} else {
		if (sysctl(machdep.llsc) {
			test-and-set lock
		} else {
			/* Eek!  SMP without ll/sc */
			syscall lock
		}
	}

Simon.
--
Simon Burge                                   <simonb@wasabisystems.com>
NetBSD Development, Support and Service:   http://www.wasabisystems.com/