Subject: lock performance on i386
To: None <>
From: David Laight <>
List: tech-smp
Date: 09/19/2002 19:30:05
The spin lock code is currently:

static __inline void 
__cpu_simple_lock(__cpu_simple_lock_t *alp)
	int __val = __SIMPLELOCK_LOCKED;
	do {
		__asm __volatile("xchgl %0, %2"
			: "=r" (__val)
			: "0" (__val), "m" (*alp));
	} while (__val != __SIMPLELOCK_UNLOCKED);

Which means that if the lock is contended then the cpu is
doing continuous, expensive, locked operations on the bus.
Which, IIRC, force a cache snoop? cycle on all the cpus.

It would be much more efficient to use the following:
(in asm, but not __asm...)

	movl	4(%esp),%edx
1:	xchgl	(%edx),%eax
	testl	%eax,%eax
	jne	2f
2:	pause			# for P4
	cmpl	(%edx),%eax	# just read until lock available
	jne	1b
	jmp	2b

(inlined you might want the 'ret' at the end...)
Also under debug, maybe take a function call for the lower
loop - that could do some checks for lock contention
if it spins for too long.


David Laight: