Port-amd64 archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Why does membar_consumer() do anything on x86_64?
On Wed, Jul 21, 2010 at 05:48:43PM -0700, Dennis Ferguson wrote:
> > Actually, that's not what I observed while working on Xen rings.
> > loads *can* be reordered (because of speculative loads). I noticed this
> > on various, post-ppro CPUs.
>
> I don't get that. Just because the load was speculative doesn't
> mean it was out-of-order, and doesn't mean you need memory barriers
> for regular memory (you probably will need them if the reads are
> to I/O registers).
> If you don't use the result of the speculative
> load it doesn't matter what order it was read in, but if you do use the
> result the processor can still ensure that the speculative load occurred
> in program order.
Actually, I had to use memory barrier in Xen code to get correct
result. And this was on CPUs up to core 2.
> [...]
> > 7.2 "multiprocessor memory access ordering" says more or less the same
> > thing ("loads may pass store"). The first point ("all load, store and I/O
> > operations from a single processor appear in program order") is confusing,
> > but it means nothing more but "the code running on a CPU sees its own
> > data in order". When accessing shared memory, access may appear reordered to
> > another CPU (this is coherent with what is said in 7.1.1) as shown in
> > examples following in 7.2
>
> For this part you should look at a newer manual. The current one (Rev. 3.17,
> June 2010) says:
exactly the same thing as the 2007 one
>
> From the point of view of a program, in ascending order of priority:
>
> ? All loads, stores and I/O operations from a single processor appear
> to occur in program order to the code running on that processor and all
> instructions appear to execute in program order.
"to the code running on that processor". This is the important words.
Code running on another processor (or an I/O device) may not
see it that way.
>
> In this context:
>
> - Loads do not pass previous loads (loads are not re-ordered). Stores
> do not pass previous stores (stores are not re-ordered)
>
> In the examples below all memory values are initialized to zero.
>
> Processor 0 Processor 1
> Store A ? 1 Load B
> Store B ? 1 Load A
>
> Load A cannot read 0 when Load B reads 1.
You stopped at the first example. Later there is, for example:
- Non-overlapping Loads may pass stores.
Processor 0 Processor 1
Store A 1 Store B 1
Load B Load A
All combinations of Load A and Load B values are allowed. Where sequential
consistency is needed (for example in Dekker's algorithm for mutual exclusion),
an MFENCE instruction should be used between the store and the subsequent
load, or a locked access, such as LOCK XCHG, should be used for the store.
>
> Both AMD and Intel gave this revelation about how their processors
> actually worked in 2007, and fixed up their manuals sometime after that.
> If the example above is always true without memory barriers then
> for SMP programming membar_producer() and membar_consumer() can be
> nops. Linux has run with their equivalents of the above doing nothing
> for 2.5 years now, and I've been running my NetBSD amd64 and i386
> development machines with kernels with membar_producer() and
> membar_consumer() nop'd for a couple of weeks now with no ill effects
> that I've seen so far.
The Xen issues with missing barrier were very hard to reproduce,
and showed up only on havily loaded systems, and only once in a while.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index