Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Why does membar_consumer() do anything on x86_64?

On 21 Jul 2010, at 12:18 , Manuel Bouyer wrote:

> On Mon, Jul 12, 2010 at 09:14:10AM -0700, Dennis Ferguson wrote:
>> That's out of date, both Intel and AMD clarified this and changed their
>> manuals in 2007.  Here's what "Intel 64 and IA-32 Architectures Software
>> Developer's Manual" volume 3A (order number 253668) now says:
>>   Neither Loads Nor Stores Are Reordered with Like Operations
>>    The Intel-64 memory-ordering model allows neither loads nor stores to be
>>    reordered with the same kind of operation. That is, it ensures that loads
>>    are seen in program order and that stores are seen in program order.
> Actually, that's not what I observed while working on Xen rings.
> loads *can* be reordered (because of speculative loads). I noticed this
> on various, post-ppro CPUs.

I don't get that.  Just because the load was speculative doesn't
mean it was out-of-order, and doesn't mean you need memory barriers
for regular memory (you probably will need them if the reads are
to I/O registers).  If you don't use the result of the speculative
load it doesn't matter what order it was read in, but if you do use the
result the processor can still ensure that the speculative load occurred
in program order.

> Also, this doesn't match the amd64 manual I have (revision 3.13, july 2007):
> in volume 2 ("system programming"), 7.1 ("single-processor memory access
> ordering"), 7.1.1 ("read ordering"):
> - out of order reads are allowed ...
> - speculative reads are allowed ...
> - reads can be reordered ahead of writes
> - a read cannot be reordered ahead of a prior write if the read is from
>  the same location as the prior write

Section 7.1 is "Single-Processor Memory Access Ordering".  This might
be an issue for programming I/O devices, but has nothing to do with
the need for memory barriers to protect SMP data structures.

> 7.2 "multiprocessor memory access ordering" says more or less the same
> thing ("loads may pass store"). The first point ("all load, store and I/O
> operations from a single processor appear in program order") is confusing,
> but it means nothing more but "the code running on a CPU sees its own
> data in order". When accessing shared memory, access may appear reordered to
> another CPU (this is coherent with what is said in 7.1.1) as shown in
> examples following in 7.2

For this part you should look at a newer manual.  The current one (Rev. 3.17,
June 2010) says:

    From the point of view of a program, in ascending order of priority:

    â  All loads, stores and I/O operations from a single processor appear
       to occur in program order to the code running on that processor and all
       instructions appear to execute in program order.

       In this context:

       -  Loads do not pass previous loads (loads are not re-ordered). Stores
          do not pass previous stores (stores are not re-ordered)

          In the examples below all memory values are initialized to zero.

          Processor 0         Processor 1
          Store A â 1        Load B
          Store B â 1        Load A

          Load A cannot read 0 when Load B reads 1.

Both AMD and Intel gave this revelation about how their processors
actually worked in 2007, and fixed up their manuals sometime after that.
If the example above is always true without memory barriers then
for SMP programming membar_producer() and membar_consumer() can be
nops.  Linux has run with their equivalents of the above doing nothing
for 2.5 years now, and I've been running my NetBSD amd64 and i386
development machines with kernels with membar_producer() and
membar_consumer() nop'd for a couple of weeks now with no ill effects
that I've seen so far.

Dennis Ferguson

Home | Main Index | Thread Index | Old Index