Subject: Re: lock-free data structures
To: None <kpneal@pobox.com>
From: Garrett D'Amore <garrett_damore@tadpole.com>
List: tech-kern
Date: 01/03/2006 20:41:44
Solaris handles this (spinning vs. blocking) quite nicely with what it
calls "adaptive mutexes".  These mutexes spin when when there is only
one thread calling, and the owner is currently running.  Otherwise they
block.  Works really, really well.

I'm not trying to do a "my OS" is better sort of thing or start a
flamefest, but merely point developers at the implementation as a source
of alternatives if one is considering new implementation or changing
current implementation.

I'm still pretty new to NetBSD kernel stuff, but it looks to me like
there really isn't much in the way of locks in the kernel right now -- a
lot more focus seems to be given to just masking interrupts.  I'm not
sure if we want to do more locking, but it would certainly be good for
(scalable) SMP support.  But the locking philosophy is harder for some
developers to get their heads around, tends to be a bit more error
prone, and ultimately may represent too much change.  Plus on
uniprocessor systems (e.g. embedded platforms that NetBSD seems to be so
focused on) blocking interrupts is probably simply more performant.

    -- Garrett

kpneal@pobox.com wrote:

>On Mon, Jan 02, 2006 at 07:56:14PM +0100, zvrba@globalnet.hr wrote:
>  
>
>>Recently I've come across many papers describing the construction of
>>lock-free data structures using compare&exchange instructions. Such
>>implementations do not require explicit locks (mutexes) in an MP
>>environment. The advantages are more (no explicit locking needed) or
>>less (performance gains) obvious, so the question arises: why aren't
>>such data structures used more widely (in general and in the kernel
>>code?)
>>    
>>
>
>  
>
>>Bottom line: what's the "catch" with lock-free data structures?
>>    
>>
>
>A guy I know at work did some experimenting with atomic locks vs the more
>common blocking mutexes. This was in a threaded user-level application
>and not inside an OS (if it matters).
>
>It turns out that atomic locks are faster until the number of threads
>reaches a certain point. After that the application gets slower and slower.
>The number of threads varied depending on the platform, and it probably
>varied on the test case as well.
>
>Here's one example of an issue: Atomic operations spin. With many threads
>contending this results in lots of CPU being burned with little actual
>work going on. Contrast this with lock types that forfeit the CPU when
>they block.
>
>If blocking on a lock causes the loss of the CPU then you end up with
>roughly two CPUs operating on the lock at once: one to do an unlock and
>another to do a lock. That's a small window compared to having all your
>CPUs spinning.
>
>AND, if you have a server handling requests then speeding up individual
>requests with atomic locks may delay other requests. CPUs spinning waiting
>in the service of one request are CPUs not serving other requests. You
>have an overall loss of performance.
>
>"There is no silver bullet."
>
>  
>
>>An immediate drawback is that not all architectures support the cmpxchg
>>instruction. What are others? Is the possible performance gain negligible
>>compared to the required effort of recoding a large volume of code?
>>    
>>
>
>I think the Amiga documentation forbids use of the m68k CAS instruction...
>  
>


-- 
Garrett D'Amore                          http://www.tadpolecomputer.com/
Sr. Staff Engineer          Extending the Power of 64-bit UNIX Computing
Tadpole Computer, Inc.                             Phone: (951) 325-2134