Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Why does membar_consumer() do anything on x86_64?

On 11 Jul 2010, at 18:55 , Jean-Yves Migeon wrote:
> Agreed, although notice that membar_enter() is aliased to
> membar_consumer() in atomic.S, for amd64.

Yes, that was done at revision 1.8, when a bug in
membar_enter() was fixed.  I think it was done solely
because the code was the same in both functions (after the
fix).  I don't think the code should be the same in both
functions, though, so fixing that would require separating
them again like they were before revision 1.8.

> - for i386, Solaris adds a lock for all memory barriers, while Linux
> only does that for its rmb() (it seems to provide less membar_ops than
> Solaris or NetBSD). The wmb is ~ a noop, except for x86 OOSTORE.
> http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/atomic/i386/atomic.s
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=include/asm-i386/system.h;h=d69ba937e09251769e2f00d54c0c91562a4127e8;hb=4827bbb06e4b59922c2b9bfb13ad1bf936bdebe5

I don't know about Solaris, but note that for Linux the equivalents
of membar_producer() and membar_consumer() are not wmb() and rmb()
(those are used for order enforcement with I/O devices, even on
uniprocessors), they are smp_wmb() and smp_rmb().

Also, the kernel you are looking at is too old; Intel and AMD only published
their memory ordering models in 2007.  Here's their current version of that


Note that smp_rmb() is defined to nothing (barrier() just prevents compiler
reordering, it generates no instructions) in all cases except when
CONFIG_X86_PPRO_FENCE is defined.  That refers to the P6, a 15 year old
32 bit CPU.  smp_wmb() is similarly defined to nothing except when
CONFIG_X86_OOSTORE is defined, and I'm pretty sure that the latter is
only defined when the kernel needs to run on some non-Intel, non-AMD
32 bit CPUs.

This does suggest the i386 port may need membar_producer() and membar_consumer()
to do something for kernels that need to run SMP on certain CPUs, but I
think for the amd64 port there is no CPU on which they need to have
code in them.  The function call by itself (which is a more expensive way
to do what Linux barrier() does) should be sufficient.

> - According to anandtech, P6 (and P-M) may reorder loads, via the ROB
> (ReOrder Buffer):
> http://www.anandtech.com/show/1998/5

Yes, I think that's what CONFIG_X86_PPRO_FENCE in Linux is for.  That's
not a 64 bit CPU, though.

Even if, despite all this, we were to decide that membar_producer() and
membar_consumer() were doing what they were intended, however, that would
just mean that NetBSD is missing some synchronization primitives since
those are clearly doing too much for my purposes (which I'm sure aren't
unusual).  The thing I need is, if you have

    volatile int a = 0;
    volatile int b = 0;

and CPU 1 does

    a = 1;
    b = 1;

while CPU 2 does

    cpu2_b = b;
    cpu2_a = a;

in that order then CPU 2 should never end up with (cpu2_b == 1 && cpu2_a == 0).
That's all I need, and that's also exactly how Intel (and AMD) says their CPUs
work.  While I can no longer find the Intel 2007 white paper, see this


at about 12 minutes in.  So there should be functions that look like

    a = 1;
    b = 1;


    cpu2_b = b;
    cpu2_a = a;

which do nothing on an amd64 processor other than keep the compiler
from reordering those operations (but which may do something on other
architectures).  If membar_x() isn't membar_producer(), and membar_y()
isn't membar_consumer(), than what should they be?

Dennis Ferguson

Home | Main Index | Thread Index | Old Index