Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Why does membar_consumer() do anything on x86_64?



hi,

what's the status of this topic?

YAMAMOTO Takashi

> On 21 Jul 2010, at 12:18 , Manuel Bouyer wrote:
> 
>> On Mon, Jul 12, 2010 at 09:14:10AM -0700, Dennis Ferguson wrote:
>>> That's out of date, both Intel and AMD clarified this and changed their
>>> manuals in 2007.  Here's what "Intel 64 and IA-32 Architectures Software
>>> Developer's Manual" volume 3A (order number 253668) now says:
>>> 
>>>    8.2.3.2  Neither Loads Nor Stores Are Reordered with Like Operations
>>> 
>>>    The Intel-64 memory-ordering model allows neither loads nor stores to be
>>>    reordered with the same kind of operation. That is, it ensures that loads
>>>    are seen in program order and that stores are seen in program order.
>> 
>> Actually, that's not what I observed while working on Xen rings.
>> loads *can* be reordered (because of speculative loads). I noticed this
>> on various, post-ppro CPUs.
> 
> I don't get that.  Just because the load was speculative doesn't
> mean it was out-of-order, and doesn't mean you need memory barriers
> for regular memory (you probably will need them if the reads are
> to I/O registers).  If you don't use the result of the speculative
> load it doesn't matter what order it was read in, but if you do use the
> result the processor can still ensure that the speculative load occurred
> in program order.
> 
>> Also, this doesn't match the amd64 manual I have (revision 3.13, july 2007):
>> in volume 2 ("system programming"), 7.1 ("single-processor memory access
>> ordering"), 7.1.1 ("read ordering"):
>> - out of order reads are allowed ...
>> - speculative reads are allowed ...
>> - reads can be reordered ahead of writes
>> - a read cannot be reordered ahead of a prior write if the read is from
>>  the same location as the prior write
> 
> Section 7.1 is "Single-Processor Memory Access Ordering".  This might
> be an issue for programming I/O devices, but has nothing to do with
> the need for memory barriers to protect SMP data structures.
> 
>> 7.2 "multiprocessor memory access ordering" says more or less the same
>> thing ("loads may pass store"). The first point ("all load, store and I/O
>> operations from a single processor appear in program order") is confusing,
>> but it means nothing more but "the code running on a CPU sees its own
>> data in order". When accessing shared memory, access may appear reordered to
>> another CPU (this is coherent with what is said in 7.1.1) as shown in
>> examples following in 7.2
> 
> For this part you should look at a newer manual.  The current one (Rev. 3.17,
> June 2010) says:
> 
>     From the point of view of a program, in ascending order of priority:
> 
>     7  All loads, stores and I/O operations from a single processor appear
>        to occur in program order to the code running on that processor and all
>        instructions appear to execute in program order.
> 
>        In this context:
> 
>        -  Loads do not pass previous loads (loads are not re-ordered). Stores
>           do not pass previous stores (stores are not re-ordered)
> 
>           In the examples below all memory values are initialized to zero.
> 
>           Processor 0         Processor 1
>           Store A ← 1        Load B
>           Store B ← 1        Load A
> 
>           Load A cannot read 0 when Load B reads 1.
> 
> Both AMD and Intel gave this revelation about how their processors
> actually worked in 2007, and fixed up their manuals sometime after that.
> If the example above is always true without memory barriers then
> for SMP programming membar_producer() and membar_consumer() can be
> nops.  Linux has run with their equivalents of the above doing nothing
> for 2.5 years now, and I've been running my NetBSD amd64 and i386
> development machines with kernels with membar_producer() and
> membar_consumer() nop'd for a couple of weeks now with no ill effects
> that I've seen so far.
> 
> Dennis Ferguson


Home | Main Index | Thread Index | Old Index