tech-pkg archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: rust and rust-bin confusion



> Date: Sun, 11 May 2025 12:01:29 +0000
> From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
> 
> The code you quoted in std::sys::pal::unix::os::getenv::{{closure}} is
> a very weird thing to do.
> 
> Normally you would do
> 
> 	release barriers
> 1:	ldrex
> 	arithmetic
> 	strex
> 	branch if failed to 1b
> 	acquire or dekker barriers
> 
> Where is this code coming from that inserts a barrier _between_ the
> ldrex and strex?  That's a recipe for failure.  Does llvm generate
> that?  Is it coming from the Rust compiler, or compiler-rt?  Or is it
> inline asm in this Rust module?

Here's my guess about where this code is coming from, assuming you're
using Rust 1.86 or something close to it (the code looks different in
Rust 1.76 and I didn't care to track all the changes backwards from
1.86 back to 1.76):

std::sys::pal::unix::os::getenv calls env_read_lock():

    661 pub fn getenv(k: &OsStr) -> Option<OsString> {
    662     // environment variables with a nul byte can't be set, so their value is
    663     // always None as well
    664     run_with_cstr(k.as_bytes(), &|k| {
 => 665         let _guard = env_read_lock();

https://github.com/rust-lang/rust/blob/1.86.0/library/std/src/sys/pal/unix/os.rs#L661-L665

env_read_lock() takes a RwLock reader lock, which the caller getenv
will drop eventually:

    621 pub fn env_read_lock() -> impl Drop {
 => 622     ENV_LOCK.read().unwrap_or_else(PoisonError::into_inner)
    623 }

https://github.com/rust-lang/rust/blob/1.86.0/library/std/src/sys/pal/unix/os.rs#L621-L623

RwLock<T> has several implementations -- I think we are using the
queue.rs one as `unix':

    15     } else if #[cfg(any(
    16         target_family = "unix",
    17         all(target_os = "windows", target_vendor = "win7"),
    18         all(target_vendor = "fortanix", target_env = "sgx"),
    19         target_os = "xous",
    20     ))] {
    21         mod queue;
    22         pub use queue::RwLock;

https://github.com/rust-lang/rust/blob/1.86.0/library/std/src/sys/sync/rwlock/mod.rs#L15-L22

The unlock path of this RwLock implementation involves
compare_exchange_weak(..., /*success*/Release, /*failure*/Relaxed) or
compare_exchange_weak(..., /*success*/AcqRel, /*failure*/Relaxed):

    525     unsafe fn unlock_contended(&self, state: State) {
    ...
    530         // We want to atomically release the lock and try to acquire the queue lock.
    531         loop {
    ...
 => 536                 match self.state.compare_exchange_weak(current, next, Release, Relaxed) {
    537                     Ok(_) => return,
    538                     Err(new) => {
    539                         current = new;
    540                         continue;
    541                     }
    542                 }
...
 => 547             match self.state.compare_exchange_weak(current, next, AcqRel, Relaxed) {
    ...
    552                 Err(new) => current = new,
    553             }
    554         }
    555     }

https://github.com/rust-lang/rust/blob/1.86.0/library/std/src/sys/sync/rwlock/queue.rs#L525-L558

I tested a similar use of C11 atomic_compare_exchange_weak with clang
20.1.0 -march=armv6 in godbolt, and sure enough, it generated the
sequence with a memory barrier _between_ the ldrex and strex, when I
use memory_order_release or memory_order_acq_rel as the success
ordering:

        ldrex   lr, [r0]
        mov     r4, #0
        cmp     lr, r1
        bne     .LBB0_1
        mcr     p15, #0, r12, c7, c10, #5
        strex   r4, r2, [r0]

https://godbolt.org/z/KWME698f7

So I think llvm's atomic_compare_exchange_weak code generation on
armv6 is broken.  The armv7 ARM (DDI 0406C.d, ID040418, 2018,
Sec. A3.4.5 `Load-Exclusive and Store-Exclusive usage restrictions',
p. A3-120) says:

	An implementation might clear an exclusive monitor between the
	LDREX and STREX, without any application-related cause.  For
	example, this might happen because of cache evictions.
	Software for such an implementation must, in any single thread
	of execution, avoid having any explicit memory accesses,
	_System control register updates_, or cache maintenance
	operations between the LDREX instruction and the associated
	STREX instruction.  [emphasis added]

The MCR instruction (Move to Coprocessor from Register) constitutes a
`System control register update' (DDI 0406C.d, ID040418, Sec. B3.15
`About the system control registers for VMSA', p. B3-1440):

	On an ARMv7-A or ARMv7-R implementation, the system control
	registers comprise:

	o the registers accessed using the System Control Coprocessor,
	  CP15
	...

The armv6 ARM (DDI 01001, 2005) is not as clear about this -- it
doesn't prohibit `System control register updates' between ldrex and
strex for progress guarantees; it does prohibits `explicit memory
transactions and cache maintenance operations' (Sec. A2.9
`Synchronization primitives', subsec. A2.9.4 `Summary of operation',
subsubsec. `Usage restrictions', p. A2-52, clause 4), and describes
the v6-style DMB under Sec. B6.6.5 `Register 7: cache management
functions', listed among others in Table B6-6 on pp. B6-19 through
B6-21, about which the manual says: `The level 1 cache maintenance
operations shown in Table B6-6 are invoked using the following
instruction format: MCR p15,0,<Rd>,c7,<CRm>,<opcode2>'.

So, even though the armv6 manual is a little vague about this, I think
we can safely call this an llvm bug.  Its atomic_compare_exchange_weak
sequence with success=Release should really be

	mcr	p15, #0, r12, c7, c10, #5
	ldrex	...
	...
	strex	...

instead of

	ldrex	...
	...
	mcr	p15, #0, r12, c7, c10, #5
	strex	...

(even though that technically looks vaguely suboptimal because it
issues a barrier even if it will fail).


Home | Main Index | Thread Index | Old Index