Subject: Re: lock-free data structures
To: None <zvrba@globalnet.hr>
From: Jonathan Stone <jonathan@Pescadero.dsg.stanford.edu>
List: tech-kern
Date: 01/04/2006 13:57:57
In message <20060104182605.GB5540@zax.ifi.uio.no>,
zvrba@globalnet.hr writes:

>On Wed, Jan 04, 2006 at 09:38:20AM -0800, jonathan@dsg.stanford.edu wrote:
>> 
>> For practical kernel use, one will soon find one *needs* a DCAS/CAS2,
>> or some suitable approximation. 
>> 
>Newer 32-bit Intel processors offer the 8-byte cmpxchg, and the 64-bit
>AMDs offer the 16-byte cmpxchg. Would these be sufficient? 

One can do that. It's a special-case of using a ptrdiff_t-sized atomic
operation and treating the two halves of the pointer as an index.

(I originally intended to subsume that case, but I concede that, for
clarity, I ended up using wording which hid that intent.)


>In one of
>the papers I remember seeing that DCAS si defined to take two distinct
>memory locations but which are not neccessarily adjacent, unlike the
>working of CMPXCHG8B]

Yes, exactly.



>> Oh, one more thing: to suggest using ras(9) is proof-positive that the
>> suggester simply doesn't understand the problem domain.  Bershad's
>>
>Hm, I admit that I don't understand the NetBSD kernel part. I have never
>even read the manpage thoroughly. What I read in the paper reminded me
>on ras(9) on which I have browsed through the manpage some time ago.. I
>assumed that it was intended for in-kernel use since it is in the 9th
>section of the manual.. :) It turns out that it is (I guess) just the
>kernel support for rasctl(2)?

Huh? To quote ras(9):

DESCRIPTION
     Restartable atomic sequences are user code sequences which are guaranteed
                                      ^^^^^^^^^^^^^^^^^^^
     to execute without preemption.  

Emphasis added, not in original.


>Bottom line: disregarding portability, would it be beneficial to use
>waitfree data structures in the kernel instead of explicit locks (on
>architectures that properly and efficiently support DCAS)? Or most of
>such structures are logically tied to some other resource so locks are
>needed anyway (as someone has already pointed out)?

That's a hard question.  Unfortunately, from a systems perspective,
one sometimes has to take some of the work in the PODC (principles of
distributed computing) with not just a grain, but a sackful of salt:
theoretical results about O(1) wait-free system sometime end up
assuming synchronization primitives that are O(n^3) in hardware, if
implmemented so as to avoid starvation. If you ask Michael Greenwald
(at UPenn last I heard), he can probably still cite you examples.



The other gotcha with wait-free algorithms is that (not infrquently)
even PhD candidates in sytems (OS arena) strain to follow the details
of wait-free algorithms.  Getting the details right is _hard_.  That
poses a high bar to development and maintenance of such algorithms.

For one example, work through all the details in Michael Greenwald's
NBS paper or dissertation (cited previoiusly).


The gripping hand, though, is that we're still at the big-lock stage.
Before NBS becomes an issue, we need to first move code out from
under the biglock.

And, just for my own interests, networking, an even bigger win there
is not need synchronization *at all* for operations like allocating or
freeing mbufs.  Instead, just add or remove on a per-CPU freelist
dedicated to the desired interrupt priority.

And where synchronization is necessary, amortize the cost over large
chunks (e.g., an entire interrupt's worth of packets), not just one.