Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Why does membar_consumer() do anything on x86_64?

On Wed, Jul 21, 2010 at 05:48:43PM -0700, Dennis Ferguson wrote:
> > Actually, that's not what I observed while working on Xen rings.
> > loads *can* be reordered (because of speculative loads). I noticed this
> > on various, post-ppro CPUs.
> I don't get that.  Just because the load was speculative doesn't
> mean it was out-of-order, and doesn't mean you need memory barriers
> for regular memory (you probably will need them if the reads are
> to I/O registers).
> If you don't use the result of the speculative
> load it doesn't matter what order it was read in, but if you do use the
> result the processor can still ensure that the speculative load occurred
> in program order.

Actually, I had to use memory barrier in Xen code to get correct
result. And this was on CPUs up to core 2.

> [...]
> > 7.2 "multiprocessor memory access ordering" says more or less the same
> > thing ("loads may pass store"). The first point ("all load, store and I/O
> > operations from a single processor appear in program order") is confusing,
> > but it means nothing more but "the code running on a CPU sees its own
> > data in order". When accessing shared memory, access may appear reordered to
> > another CPU (this is coherent with what is said in 7.1.1) as shown in
> > examples following in 7.2
> For this part you should look at a newer manual.  The current one (Rev. 3.17,
> June 2010) says:

exactly the same thing as the 2007 one

>     From the point of view of a program, in ascending order of priority:
>     ?  All loads, stores and I/O operations from a single processor appear
>        to occur in program order to the code running on that processor and all
>        instructions appear to execute in program order.

"to the code running on that processor". This is the important words.
Code running on another processor (or an I/O device) may not
see it that way.

>        In this context:
>        -  Loads do not pass previous loads (loads are not re-ordered). Stores
>           do not pass previous stores (stores are not re-ordered)
>           In the examples below all memory values are initialized to zero.
>           Processor 0         Processor 1
>           Store A ? 1        Load B
>           Store B ? 1        Load A
>           Load A cannot read 0 when Load B reads 1.

You stopped at the first example. Later there is, for example:
- Non-overlapping Loads may pass stores.
                   Processor 0                      Processor 1
                   Store A  1                       Store B  1
                   Load B                           Load A
All combinations of Load A and Load B values are allowed. Where sequential
consistency is needed (for example in Dekker's algorithm for mutual exclusion),
an MFENCE instruction should be used between the store and the subsequent
load, or a locked access, such as LOCK XCHG, should be used for the store.

> Both AMD and Intel gave this revelation about how their processors
> actually worked in 2007, and fixed up their manuals sometime after that.
> If the example above is always true without memory barriers then
> for SMP programming membar_producer() and membar_consumer() can be
> nops.  Linux has run with their equivalents of the above doing nothing
> for 2.5 years now, and I've been running my NetBSD amd64 and i386
> development machines with kernels with membar_producer() and
> membar_consumer() nop'd for a couple of weeks now with no ill effects
> that I've seen so far.

The Xen issues with missing barrier were very hard to reproduce,
and showed up only on havily loaded systems, and only once in a while.

Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference

Home | Main Index | Thread Index | Old Index