Subject: Restartable Atomic Sequences
To: None <tech-kern@netbsd.org>
From: Gregory McGarry <g.mcgarry@ieee.org>
List: tech-kern
Date: 07/04/2002 13:06:35
I implemented restartable atomic sequences (RAS) to compare their
performance with atomic test-and-set instructions.  The RAS facility
is described in

http://www.cs.cmu.edu/afs/cs/project/mach/public/doc/published/Rcs.ps

and is required on the MIPS R3000 cpu for libpthread support.  But
the question arises whether the RAS facility would be more broadly
useful, since atomic test-and-set instructions lock the memory bus
and have cache issues.

Since I don't have an R4000+ cpu to compare the performance, I tested
on a DX4 (i386).  I compared three approaches:

1) kernel acquire and release systems calls using sysarch(2)

2) RAS using explicit registration with a system call using sysarch(2)

3) __cpu_simple_lock() from /usr/include/machine/lock.h.  This
   function is inlined.

Here are the run times (in seconds) for 1E6 iterations of locking and
unlocking a lock using the three approaches:

	1) 21.570237
	2) 1.046555
	3) 0.946119

Nothing really surprising here I suppose.  Don't use system calls.
Actually, I did trim an instruction off the lock code which improves
the RAS result here.  You can see the improvement below.

I compared the last two approaches using inlined and non-inlined
versions of __cpu_simple_lock() over 10E6 iterations:

	inlined:
	2) 3.808963
	3) 3.167649

	non-inlined:
	2) 3.808029
	3) 9.456359

So, the problem with RAS using explicit registration is the cost
of the function call; the same conclusion as in the paper.  Hardly
critical, all things considered.

The only other interesting fact is that during the 3.8 seconds
and 10E6 locks, 52 restarts were performed.  Not much statistical
variation either; suprisingly determinstic in fact.

Anyway, if there aren't any MI uses for this, I'll just stick it
in the MD mips trap handler and use sysarch(2) for registration.

	-- Gregory McGarry <g.mcgarry@ieee.org>