Subject: Re: lock-free data structures
To: None <zvrba@globalnet.hr>
From: None <kpneal@pobox.com>
List: tech-kern
Date: 01/03/2006 21:22:43
On Mon, Jan 02, 2006 at 07:56:14PM +0100, zvrba@globalnet.hr wrote:
> Recently I've come across many papers describing the construction of
> lock-free data structures using compare&exchange instructions. Such
> implementations do not require explicit locks (mutexes) in an MP
> environment. The advantages are more (no explicit locking needed) or
> less (performance gains) obvious, so the question arises: why aren't
> such data structures used more widely (in general and in the kernel
> code?)

> Bottom line: what's the "catch" with lock-free data structures?

A guy I know at work did some experimenting with atomic locks vs the more
common blocking mutexes. This was in a threaded user-level application
and not inside an OS (if it matters).

It turns out that atomic locks are faster until the number of threads
reaches a certain point. After that the application gets slower and slower.
The number of threads varied depending on the platform, and it probably
varied on the test case as well.

Here's one example of an issue: Atomic operations spin. With many threads
contending this results in lots of CPU being burned with little actual
work going on. Contrast this with lock types that forfeit the CPU when
they block.

If blocking on a lock causes the loss of the CPU then you end up with
roughly two CPUs operating on the lock at once: one to do an unlock and
another to do a lock. That's a small window compared to having all your
CPUs spinning.

AND, if you have a server handling requests then speeding up individual
requests with atomic locks may delay other requests. CPUs spinning waiting
in the service of one request are CPUs not serving other requests. You
have an overall loss of performance.

"There is no silver bullet."

> An immediate drawback is that not all architectures support the cmpxchg
> instruction. What are others? Is the possible performance gain negligible
> compared to the required effort of recoding a large volume of code?

I think the Amiga documentation forbids use of the m68k CAS instruction...
-- 
Kevin P. Neal                                http://www.pobox.com/~kpn/

"It sounded pretty good, but it's hard to tell how it will work out
in practice." -- Dennis Ritchie, ~1977, "Summary of a DEC 32-bit machine"