Port-amd64 archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Why does membar_consumer() do anything on x86_64?
hi,
what's the status of this topic?
YAMAMOTO Takashi
> On 21 Jul 2010, at 12:18 , Manuel Bouyer wrote:
>
>> On Mon, Jul 12, 2010 at 09:14:10AM -0700, Dennis Ferguson wrote:
>>> That's out of date, both Intel and AMD clarified this and changed their
>>> manuals in 2007. Here's what "Intel 64 and IA-32 Architectures Software
>>> Developer's Manual" volume 3A (order number 253668) now says:
>>>
>>> 8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations
>>>
>>> The Intel-64 memory-ordering model allows neither loads nor stores to be
>>> reordered with the same kind of operation. That is, it ensures that loads
>>> are seen in program order and that stores are seen in program order.
>>
>> Actually, that's not what I observed while working on Xen rings.
>> loads *can* be reordered (because of speculative loads). I noticed this
>> on various, post-ppro CPUs.
>
> I don't get that. Just because the load was speculative doesn't
> mean it was out-of-order, and doesn't mean you need memory barriers
> for regular memory (you probably will need them if the reads are
> to I/O registers). If you don't use the result of the speculative
> load it doesn't matter what order it was read in, but if you do use the
> result the processor can still ensure that the speculative load occurred
> in program order.
>
>> Also, this doesn't match the amd64 manual I have (revision 3.13, july 2007):
>> in volume 2 ("system programming"), 7.1 ("single-processor memory access
>> ordering"), 7.1.1 ("read ordering"):
>> - out of order reads are allowed ...
>> - speculative reads are allowed ...
>> - reads can be reordered ahead of writes
>> - a read cannot be reordered ahead of a prior write if the read is from
>> the same location as the prior write
>
> Section 7.1 is "Single-Processor Memory Access Ordering". This might
> be an issue for programming I/O devices, but has nothing to do with
> the need for memory barriers to protect SMP data structures.
>
>> 7.2 "multiprocessor memory access ordering" says more or less the same
>> thing ("loads may pass store"). The first point ("all load, store and I/O
>> operations from a single processor appear in program order") is confusing,
>> but it means nothing more but "the code running on a CPU sees its own
>> data in order". When accessing shared memory, access may appear reordered to
>> another CPU (this is coherent with what is said in 7.1.1) as shown in
>> examples following in 7.2
>
> For this part you should look at a newer manual. The current one (Rev. 3.17,
> June 2010) says:
>
> From the point of view of a program, in ascending order of priority:
>
> 7 All loads, stores and I/O operations from a single processor appear
> to occur in program order to the code running on that processor and all
> instructions appear to execute in program order.
>
> In this context:
>
> - Loads do not pass previous loads (loads are not re-ordered). Stores
> do not pass previous stores (stores are not re-ordered)
>
> In the examples below all memory values are initialized to zero.
>
> Processor 0 Processor 1
> Store A ← 1 Load B
> Store B ← 1 Load A
>
> Load A cannot read 0 when Load B reads 1.
>
> Both AMD and Intel gave this revelation about how their processors
> actually worked in 2007, and fixed up their manuals sometime after that.
> If the example above is always true without memory barriers then
> for SMP programming membar_producer() and membar_consumer() can be
> nops. Linux has run with their equivalents of the above doing nothing
> for 2.5 years now, and I've been running my NetBSD amd64 and i386
> development machines with kernels with membar_producer() and
> membar_consumer() nop'd for a couple of weeks now with no ill effects
> that I've seen so far.
>
> Dennis Ferguson
Home |
Main Index |
Thread Index |
Old Index