tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pserialize(9) vs. TAILQ



On 23 Nov, 2014, at 01:52 , Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
> [*] The x86 architecture happens to guarantee that if whoever inserted
> the entry issues a store barrier (membar_producer) after initializing
> e->key and before setting eq = e, this situation won't happen.  But
> that is not guaranteed on every architecture, and if the code had a
> control-dependent load instead of a data-dependent load, say
> 
> 	int ok[128];
> 	int value[128];
> 
> 	for (i = 0; i < 128; i++) {
> 		if (ok[i])
> 			return value[i];
> 	},
> 
> then membar_consumer/lfence between ok[i] and value[i] would be
> necessary on x86 in practice -- even if we qualified ok and value with
> volatile.

While this horse is certainly dead enough, the above isn't true for the
x86.  The x86 (Intel, and anything else recent enough to support
the 64 bit instruction set, at least) guarantees that if you start
off with

    volatile int x = 0;
    volatile int y = 0;

and then a writer does

     x = 1;
     y = 1;

while a reader does

     int my_x, my_y;

     my_y = y;
     my_x = x;

it is guaranteed that (my_y == 1 && my_x == 0) will never be true
even in the absence of explicit barrier instructions (this is
from section 7.2.3.2 of volume 3 of the Intel 64 and IA-32
Architectures Development Manual).  lfence and sfence instructions
are never needed for ordinary loads and stores, I think they're
required only for certain odd-ball SSE load and store instructions.

Of course these are better written with membar_producer()
and membar_consumer(), rather than volatile, but on an x86
both functions can be

    asm volatile("" ::: "memory")

(like their Linux equivalents are defined) since only a
compiler barrier is needed.

The reason to (almost) always prefer membar_*() to volatile
is that the barrier functions do the right thing on all machines,
rather than producing buggy code that happens to test okay
on an x86, and the barriers also tend to impose more minimal
constraints on the compiler's optimizer so it can do a
better job on the code on either side of the barrier.

Dennis Ferguson


Home | Main Index | Thread Index | Old Index