tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: membar_enter semantics



Ping?  Is your objection still standing?  If yes, can you address my
responses?


Recap:

- Proposal is to change one word in the membar_enter man page from

           Any store preceding membar_enter() will happen before all memory
           operations following it.

  to

           Any load preceding membar_enter() will happen before all memory
           operations following it.

  In other words, document membar_enter as load-before-load/store,
  i.e., as load-before-load and load-before-store -- not as
  store-before-load/store.

  This will secondarily allow us to remove a lot of confusing verbiage
  in the man page about membar_ops and load-acquire operations.

- Every use of membar_enter in tree needs load-before-load/store, not
  store-before-load/store.  So we're already relying on the proposed
  semantics, not the documented semantics.  I'd like to add some more
  load-before-load/store uses, in places where atomic_load_acquire
  doesn't quite work.

- Store-before-load is a _weird_ ordering that generally occurs only
  in exotic protocols like Dekker's algorithm, which we should not be
  encouraging in tree.

- The one-word difference is immaterial for ordering atomic-r/m/w and
  then load/store (or, equivalently, ll/sc and then load/store) -- so
  the change doesn't affect mutex_enter-type operations implemented
  with, e.g., atomic_cas.

- Our implementation of membar_enter on all CPUs (except riscv which
  has never been released) already implements the proposed semantics,
  but _does not_ implement the documented semantics on

        amd64
        i386
        powerpc
        sparc
        sparc64

  and it's been this way for fifteen years since it was introduced.

- Store-before-load is often much more expensive than
  load-before-load/store or load/store-before-store:

  . On x86 and SPARC TSO, store-before-load needs the most expensive
    memory fence instruction (MFENCE), whereas load-before-load/store
    and load/store-before-store don't require any fence at all.

  . On Armv8, store-before-load needs DMB ISH, but
    load-before-load/store needs only the cheaper DMB ISHLD.

  . On powerpc, store-before-load needs SYNC, but
    load-before-load/store and load/store-before-store only need the
    cheaper LWSYNC.  (Load-before-load/store might actually only need
    the even cheaper ISYNC, not 100% sure.)

  So even for ordering atomic r/m/w, where there's no semantic
  difference, it's cheaper to use load-before-load/store than to use
  store-before-load/store.


Home | Main Index | Thread Index | Old Index