tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: membar_enter semantics



> On Feb 11, 2022, at 10:27 AM, Taylor R Campbell <riastradh%NetBSD.org@localhost> wrote:
> 
> So x86, powerpc, and sparc64 all implement what I suggest membar_enter
> _should_ be (and what all current users use it for!), and _fail_ to
> implement what membar_enter is documented to be in the man page.
> 
> And of all the membar_enter definitions, only the riscv one fails to
> guarantee the load-before-load/store ordering I suggest it should be
> documented to have.

My beef with the membar_enter definitional change is with the word "acquire".  I.e. you want to give it what is called today "acquire" semantics.  My beef is with now "acquire" is defined, as load-before-load/store.  I think it's a non-intuitive word to use for that kind of barrier, especially when used in context of the Solaris description of membar_enter(), as you quoted earlier:

    The membar_enter() function is a generic memory barrier used
    during lock entry.  It is placed after the memory operation that
    acquires the lock to guarantee that the lock protects its data.
    No stores from after the memory barrier will reach visibility and
    no loads from after the barrier will be resolved before the lock
    acquisition reaches global visibility.

So, let's again take a look at the rules for PSO from the SPARC v9 architecture manual:

"""
The rules for PSO are:

* Loads are blocking and ordered with respect to earlier loads.
* Atomic load-stores are ordered with respect to loads.

Thus, PSO ensures that:
* Each load and atomic load-store instruction behaves as if it were followed by a MEMBAR with a mask value of 05.
* Explicit MEMBAR instructions are required to order store and atomic load-store instructions with respect to each other.
"""

My gripe hinges on what "lock acquisition" means.  In SPARC-land, the above Solaris description can be broken down into three cases:

v7 -- Locks are acquired with "ldstub", and memory is always strongly-ordered; membar_enter() is a nop.

v8 -- Locks are acquired with "ldstub", and CPU Partial Store Order; membar_enter() is "stbar" (equivalent to "membar #StoreStore" in v9-speak).

V9 -- Locks are acquired with either "ldstub" or "casx", and CPU can do PSO and RMO.  Both "ldstub" and "casx" are the same with respect to how they interact with barriers.  But PSO and RMO have two different requirements:

v9-PSO -- Because Atomic load-stores ("ldstub" and "casx") are not ordered with respect to stores, you would need "membar #StoreStore" (in PSO mode, Atomic load-stores are already strongly ordered with respect to other loads).

v9-RMO -- In **this case**, because an Atomic load-store is a load, you can use "membar #LoadLoad | #LoadStore" (your proposed semantics: load-before-load/store).  But because an Atomic load-store is also a store, you can also use "membar #StoreLoad | #StoreStore" (the current definition in our man page: store-before-load/store).


Now, because in PSO mode, Atomic load-stores are not strongly ordered with respect to stores, in order for the following code to work:

	mutex_enter();
	*foo = 0;
	result = *bar;
	mutex_exit();

...then you need to issue a "membar #StoreStore" because the ordering of the lock acquisition and the store through *foo is not guaranteed without it.  But you can also issue a "membar #StoreLoad | #StoreStore", which also works in RMO mode.

In other words, it's the **store into the lock cell** that actually performs the acquisition of the lock.  In addition to being true on platforms that have Atomic load-store (like SPARC), it is also true on platforms that have LL/SC semantics (the load in that case doesn't mean jack-squat, and the ordering guarantees that the LL has are specifically with respect to the paired SC).  Until that store is globally visible, the lock does not protect its data, on all platforms that NetBSD supports.  And this is why I described membar_enter() the way I did.

Now, if we want to loosen the definition to be "acquisition of lock", which would allow a load-before-load/store after an Atomic load-store, then that's fine, but I would object to defining "membar_enter()" as "load-before-load/store" ... rather, I would word it in a more fuzzy way such that load-before-load/store is allowed if such a barrier works for that architecture, but would also allow store-before-load/store if some other architecture needs those semantics for whatever reason.

And that's why I'd also prefer to have a new name for an additional operation if you want to have one that is explicitly "load-before-load/store".

> (sparc64 membar_sync also fails to order store-before-load, and that's
> clearly just an implementation bug because membar_sync has always been
> unequivocally intended and used as a full load/store-before-load/store
> sequential consistency barrier.  Easy fix: change membar_sync to do
> `membar #StoreLoad' for kernel running in TSO.  If userland is really
> running in RMO, then all the membars need to be fixed on sparc64.  It
> looks like sparc (i.e., `sparc32'/`sparcv8') should get the same
> treatment.)

Sync is also a little funny, because "membar #Sync" in SPARC-land means "load/store-before-load/store-and-also-resolve-any-faults", but I don't recall if membar_sync() on Solaris does "membar #Sync" or "membar #0xf"

-- thorpej



Home | Main Index | Thread Index | Old Index