tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: membar_enter semantics
Ping? Is your objection still standing? If yes, can you address my
responses?
Recap:
- Proposal is to change one word in the membar_enter man page from
Any store preceding membar_enter() will happen before all memory
operations following it.
to
Any load preceding membar_enter() will happen before all memory
operations following it.
In other words, document membar_enter as load-before-load/store,
i.e., as load-before-load and load-before-store -- not as
store-before-load/store.
This will secondarily allow us to remove a lot of confusing verbiage
in the man page about membar_ops and load-acquire operations.
- Every use of membar_enter in tree needs load-before-load/store, not
store-before-load/store. So we're already relying on the proposed
semantics, not the documented semantics. I'd like to add some more
load-before-load/store uses, in places where atomic_load_acquire
doesn't quite work.
- Store-before-load is a _weird_ ordering that generally occurs only
in exotic protocols like Dekker's algorithm, which we should not be
encouraging in tree.
- The one-word difference is immaterial for ordering atomic-r/m/w and
then load/store (or, equivalently, ll/sc and then load/store) -- so
the change doesn't affect mutex_enter-type operations implemented
with, e.g., atomic_cas.
- Our implementation of membar_enter on all CPUs (except riscv which
has never been released) already implements the proposed semantics,
but _does not_ implement the documented semantics on
amd64
i386
powerpc
sparc
sparc64
and it's been this way for fifteen years since it was introduced.
- Store-before-load is often much more expensive than
load-before-load/store or load/store-before-store:
. On x86 and SPARC TSO, store-before-load needs the most expensive
memory fence instruction (MFENCE), whereas load-before-load/store
and load/store-before-store don't require any fence at all.
. On Armv8, store-before-load needs DMB ISH, but
load-before-load/store needs only the cheaper DMB ISHLD.
. On powerpc, store-before-load needs SYNC, but
load-before-load/store and load/store-before-store only need the
cheaper LWSYNC. (Load-before-load/store might actually only need
the even cheaper ISYNC, not 100% sure.)
So even for ordering atomic r/m/w, where there's no semantic
difference, it's cheaper to use load-before-load/store than to use
store-before-load/store.
Home |
Main Index |
Thread Index |
Old Index