tech-pkg archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: rust and rust-bin confusion
> Date: Sun, 11 May 2025 12:01:29 +0000
> From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
>
> The code you quoted in std::sys::pal::unix::os::getenv::{{closure}} is
> a very weird thing to do.
>
> Normally you would do
>
> release barriers
> 1: ldrex
> arithmetic
> strex
> branch if failed to 1b
> acquire or dekker barriers
>
> Where is this code coming from that inserts a barrier _between_ the
> ldrex and strex? That's a recipe for failure. Does llvm generate
> that? Is it coming from the Rust compiler, or compiler-rt? Or is it
> inline asm in this Rust module?
Here's my guess about where this code is coming from, assuming you're
using Rust 1.86 or something close to it (the code looks different in
Rust 1.76 and I didn't care to track all the changes backwards from
1.86 back to 1.76):
std::sys::pal::unix::os::getenv calls env_read_lock():
661 pub fn getenv(k: &OsStr) -> Option<OsString> {
662 // environment variables with a nul byte can't be set, so their value is
663 // always None as well
664 run_with_cstr(k.as_bytes(), &|k| {
=> 665 let _guard = env_read_lock();
https://github.com/rust-lang/rust/blob/1.86.0/library/std/src/sys/pal/unix/os.rs#L661-L665
env_read_lock() takes a RwLock reader lock, which the caller getenv
will drop eventually:
621 pub fn env_read_lock() -> impl Drop {
=> 622 ENV_LOCK.read().unwrap_or_else(PoisonError::into_inner)
623 }
https://github.com/rust-lang/rust/blob/1.86.0/library/std/src/sys/pal/unix/os.rs#L621-L623
RwLock<T> has several implementations -- I think we are using the
queue.rs one as `unix':
15 } else if #[cfg(any(
16 target_family = "unix",
17 all(target_os = "windows", target_vendor = "win7"),
18 all(target_vendor = "fortanix", target_env = "sgx"),
19 target_os = "xous",
20 ))] {
21 mod queue;
22 pub use queue::RwLock;
https://github.com/rust-lang/rust/blob/1.86.0/library/std/src/sys/sync/rwlock/mod.rs#L15-L22
The unlock path of this RwLock implementation involves
compare_exchange_weak(..., /*success*/Release, /*failure*/Relaxed) or
compare_exchange_weak(..., /*success*/AcqRel, /*failure*/Relaxed):
525 unsafe fn unlock_contended(&self, state: State) {
...
530 // We want to atomically release the lock and try to acquire the queue lock.
531 loop {
...
=> 536 match self.state.compare_exchange_weak(current, next, Release, Relaxed) {
537 Ok(_) => return,
538 Err(new) => {
539 current = new;
540 continue;
541 }
542 }
...
=> 547 match self.state.compare_exchange_weak(current, next, AcqRel, Relaxed) {
...
552 Err(new) => current = new,
553 }
554 }
555 }
https://github.com/rust-lang/rust/blob/1.86.0/library/std/src/sys/sync/rwlock/queue.rs#L525-L558
I tested a similar use of C11 atomic_compare_exchange_weak with clang
20.1.0 -march=armv6 in godbolt, and sure enough, it generated the
sequence with a memory barrier _between_ the ldrex and strex, when I
use memory_order_release or memory_order_acq_rel as the success
ordering:
ldrex lr, [r0]
mov r4, #0
cmp lr, r1
bne .LBB0_1
mcr p15, #0, r12, c7, c10, #5
strex r4, r2, [r0]
https://godbolt.org/z/KWME698f7
So I think llvm's atomic_compare_exchange_weak code generation on
armv6 is broken. The armv7 ARM (DDI 0406C.d, ID040418, 2018,
Sec. A3.4.5 `Load-Exclusive and Store-Exclusive usage restrictions',
p. A3-120) says:
An implementation might clear an exclusive monitor between the
LDREX and STREX, without any application-related cause. For
example, this might happen because of cache evictions.
Software for such an implementation must, in any single thread
of execution, avoid having any explicit memory accesses,
_System control register updates_, or cache maintenance
operations between the LDREX instruction and the associated
STREX instruction. [emphasis added]
The MCR instruction (Move to Coprocessor from Register) constitutes a
`System control register update' (DDI 0406C.d, ID040418, Sec. B3.15
`About the system control registers for VMSA', p. B3-1440):
On an ARMv7-A or ARMv7-R implementation, the system control
registers comprise:
o the registers accessed using the System Control Coprocessor,
CP15
...
The armv6 ARM (DDI 01001, 2005) is not as clear about this -- it
doesn't prohibit `System control register updates' between ldrex and
strex for progress guarantees; it does prohibits `explicit memory
transactions and cache maintenance operations' (Sec. A2.9
`Synchronization primitives', subsec. A2.9.4 `Summary of operation',
subsubsec. `Usage restrictions', p. A2-52, clause 4), and describes
the v6-style DMB under Sec. B6.6.5 `Register 7: cache management
functions', listed among others in Table B6-6 on pp. B6-19 through
B6-21, about which the manual says: `The level 1 cache maintenance
operations shown in Table B6-6 are invoked using the following
instruction format: MCR p15,0,<Rd>,c7,<CRm>,<opcode2>'.
So, even though the armv6 manual is a little vague about this, I think
we can safely call this an llvm bug. Its atomic_compare_exchange_weak
sequence with success=Release should really be
mcr p15, #0, r12, c7, c10, #5
ldrex ...
...
strex ...
instead of
ldrex ...
...
mcr p15, #0, r12, c7, c10, #5
strex ...
(even though that technically looks vaguely suboptimal because it
issues a barrier even if it will fail).
Home |
Main Index |
Thread Index |
Old Index