tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: membar_enter semantics
Ping?  Is your objection still standing?  If yes, can you address my
responses?
Recap:
- Proposal is to change one word in the membar_enter man page from
           Any store preceding membar_enter() will happen before all memory
           operations following it.
  to
           Any load preceding membar_enter() will happen before all memory
           operations following it.
  In other words, document membar_enter as load-before-load/store,
  i.e., as load-before-load and load-before-store -- not as
  store-before-load/store.
  This will secondarily allow us to remove a lot of confusing verbiage
  in the man page about membar_ops and load-acquire operations.
- Every use of membar_enter in tree needs load-before-load/store, not
  store-before-load/store.  So we're already relying on the proposed
  semantics, not the documented semantics.  I'd like to add some more
  load-before-load/store uses, in places where atomic_load_acquire
  doesn't quite work.
- Store-before-load is a _weird_ ordering that generally occurs only
  in exotic protocols like Dekker's algorithm, which we should not be
  encouraging in tree.
- The one-word difference is immaterial for ordering atomic-r/m/w and
  then load/store (or, equivalently, ll/sc and then load/store) -- so
  the change doesn't affect mutex_enter-type operations implemented
  with, e.g., atomic_cas.
- Our implementation of membar_enter on all CPUs (except riscv which
  has never been released) already implements the proposed semantics,
  but _does not_ implement the documented semantics on
        amd64
        i386
        powerpc
        sparc
        sparc64
  and it's been this way for fifteen years since it was introduced.
- Store-before-load is often much more expensive than
  load-before-load/store or load/store-before-store:
  . On x86 and SPARC TSO, store-before-load needs the most expensive
    memory fence instruction (MFENCE), whereas load-before-load/store
    and load/store-before-store don't require any fence at all.
  . On Armv8, store-before-load needs DMB ISH, but
    load-before-load/store needs only the cheaper DMB ISHLD.
  . On powerpc, store-before-load needs SYNC, but
    load-before-load/store and load/store-before-store only need the
    cheaper LWSYNC.  (Load-before-load/store might actually only need
    the even cheaper ISYNC, not 100% sure.)
  So even for ordering atomic r/m/w, where there's no semantic
  difference, it's cheaper to use load-before-load/store than to use
  store-before-load/store.
Home |
Main Index |
Thread Index |
Old Index