Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Why does membar_consumer() do anything on x86_64?

On 12 Jul 2010, at 06:36 , Andrew Doran wrote:

> On Sat, Jul 10, 2010 at 10:59:58PM -0700, Dennis Ferguson wrote:
>> to use.  Since load-only and store-only order are also things that Intel
>> processors guarantee
> Not true, see the large comment block in this file:
> http://nxr.netbsd.org/xref/src/sys/arch/x86/include/lock.h

That's out of date, both Intel and AMD clarified this and changed their
manuals in 2007.  Here's what "Intel 64 and IA-32 Architectures Software
Developer's Manual" volume 3A (order number 253668) now says:     Neither Loads Nor Stores Are Reordered with Like Operations

    The Intel-64 memory-ordering model allows neither loads nor stores to be
    reordered with the same kind of operation. That is, it ensures that loads
    are seen in program order and that stores are seen in program order.

Apparently some early Pentium Pro's didn't behave like that, but Intel
thinks everything else does.  That sounds like a guarantee to me.

>> membar_consumer() and membar_producer() in particular would need to do
>> anything at all on Intel CPUs.
> There is a lot of misleading information on the Internet about this
> topic.  Sometimes even the Intel/AMD manuals are confused about it!

They confused Linux, since they turned their equivalents of membar_consumer()
and membar_producer() (smp_rmb() and smp_wmb()) into nops not long after
Intel (and AMD) published this.  Their kernels still seem to run, though.

membar_consumer() is a particular problem since if you have a read-mostly
data structure designed to allow lookups to continue without locking while
the data structure is being modified (which is by far the best way to build
a good SMP networking stack in the kernel) then having to call membar_consumer()
in its current form before every load seems to triple the cost of forwarding
lookups (which in the current kernel are considered so expensive that someone
built a forwarding cache in front of it to try to avoid doing it; for an SMP
stack the cache has to go and the regular forwarding lookup has to get fast).
If things have changed and the code in membar_consumer() has become unnecessary
for this architecture then I think it has to be worth pulling it out of there.

Dennis Ferguson

Home | Main Index | Thread Index | Old Index