NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
port-arm/59622: aarch64: compat32 armv6 dmb emulation can't satisfy llvm atomic r/m/w codegen
>Number: 59622
>Category: port-arm
>Synopsis: aarch64: compat32 armv6 dmb emulation can't satisfy llvm atomic r/m/w codegen
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-arm-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Aug 31 18:45:00 +0000 2025
>Originator: Taylor R Campbell
>Release: current, 11, 10, 9, ...
>Organization:
The NetBSDv6 Broke Reservation
>Environment:
aarch64 running armv6 binaries under compat_netbsd32
>Description:
The llvm implementation of atomic_compare_exchange with
memory_order_release or memory_order_acq_rel on armv6 target
yields the sequence:
ldrex lr, [r0]
mov r4, #0
cmp lr, r1
bne .LBB0_1
mcr p15, #0, r12, c7, c10, #5
strex r4, r2, [r0]
https://godbolt.org/z/KWME698f7
Unfortunately, on NetBSD/aarch64 under compat_netbsd32, this
leads to an infinite loop.
Why?
The curious instruction
mcr p15, #0, r12, c7, c10, #5
is an armv6ism [ARMV6-ARM, Sec. B2.6.1 `DataMemoryBarrier (DMB)
CP15 register 7', p. B2-18 (pdf page 674)] that has the effect
of what we now call `dmb sy' in modern Arm -- but with a
different encoding, sometimes called CP15DMB, which was
deprecated in armv7 [ARMV7AR-ARM, Sec. B4.2.5 `Data and
instruction barrier operations, VMSA', p. B4-1744].
In armv8 and armv9, there is a system control register bit
SCTLR_EL1.CP15BEN (bit 5) that enables or disables the CP15
barrier instructions at EL0 and EL1 [ARMV8.5-ARM,
Sec. D13.2.113 `SCTLR_EL1, System Control Register (EL1)',
p. D13-3405] [ARMV9-ARM, Sec. D19.2.124 `SCTLR_EL1, System
Control Register (EL1)', p. D19-6989]. If disabled, the
instruction traps to the supervisor.
We leave SCTLR_EL1.CP15BEN off, and instead emulate the
instruction in the kernel:
798 case 0x0e070fba:
799 if (arm_cond_match(insn, tf->tf_spsr)) {
800 /*
801 * mcr p15, 0, <Rd>, c7, c10, 5
802 * (data memory barrier)
803 */
804 dmb(sy);
805 }
806 goto emulated;
https://nxr.NetBSD.org/xref/src/sys/arch/aarch64/aarch64/trap.c?r=1.53#798
But this doesn't work inside a ldrex/strex loop, because the
trap to kernel breaks the reservation, so the loop never makes
progress.
On the one hand, I don't think the armv6 architecture ever
guaranteed progress in a ldrex/cp15dmb/strex loop in the first
place, so I think we are justified in calling this an llvm bug.
In fact there was already an upstream bug report, to which I
added some analysis and references when we discovered this back
in May:
https://web.archive.org/web/20250831183355/https://github.com/llvm/llvm-project/issues/41201
It's not just me -- Will Deacon (formerly(?) of Arm) agreed
with my assessment that this is an llvm bug, as reported in a
corresponding Rust issue:
https://web.archive.org/web/20250325040643/https://github.com/rust-lang/rust/issues/60605
On the one hand, this is how llvm actually generates code today
and it has gotten in the way of rust on armv6 on NetBSD:
https://mail-index.NetBSD.org/tech-pkg/2025/05/11/msg031145.html
So I think we would _also_ be justified in setting
SCTLR_EL1.CP15BEN=1, or at least creating a sysctl knob for it
like Linux has at abi.cp15_barrier [LINUXARMLEGACYINSTR].
References:
[ARMV6-ARM] ARM Architecture Reference Manual, DDI 0100I, 2005
https://developer.arm.com/documentation/ddi0100/i/
[ARMV7AR-ARM] ARM Architecture Reference Manual: ARMv7-A and
ARMv7-R edition, DDI 0406C.d (ID040418), 2018
https://developer.arm.com/documentation/ddi0406/cd/
[ARMV8.5A-ARM] Arm Architecture Reference Manual: Armv8, for
Armv8-A architecture profile, DDI 0487F.b (ID040120), 2020
https://developer.arm.com/documentation/ddi0487/fb/
[ARMV9A-ARM] Arm Architecture Reference Manual: for A-profile
architecture, DDI 0487J.a (ID042523), 2023
https://developer.arm.com/documentation/ddi0487/ja/
[LINUXARMLEGACYINSTR] Supported legacy instructions, Linux
arm64 documentation, 2017
https://web.archive.org/web/20250325082321/https://www.kernel.org/doc/Documentation/arm64/legacy_instructions.txt
>How-To-Repeat:
Compile a program using atomic_* for armv6 with llvm, e.g. with
the options: -O2 -Wall -Werror -march=armv6
https://godbolt.org/z/KWME698f7
Program:
#include <stdatomic.h>
int
cas_release(atomic_int *p, int o, int n)
{
for (;;) {
int e = o;
if (atomic_compare_exchange_weak_explicit(p, &e, n,
memory_order_release, memory_order_relaxed))
return 0;
if (e != o)
return -1;
}
}
int
cas_acq_rel(atomic_int *p, int o, int n)
{
for (;;) {
int e = o;
if (atomic_compare_exchange_weak_explicit(p, &e, n,
memory_order_acq_rel, memory_order_relaxed))
return 0;
if (e != o)
return -1;
}
}
Assembly:
cas_release:
push {r4, lr}
mov r12, #0
b .LBB0_2
.LBB0_1:
cmp lr, r1
mvnne r3, #0
tst r4, #1
movne r3, #0
cmpeq lr, r1
bne .LBB0_4
.LBB0_2:
ldrex lr, [r0]
mov r4, #0
cmp lr, r1
bne .LBB0_1
mcr p15, #0, r12, c7, c10, #5
strex r4, r2, [r0]
cmp r4, #0
mov r4, #0
mvneq r4, #0
b .LBB0_1
.LBB0_4:
mov r0, r3
pop {r4, pc}
cas_acq_rel:
push {r4, r5, r11, lr}
mov r12, #0
b .LBB1_2
.LBB1_1:
cmp lr, r1
mvnne r3, #0
cmp r4, #0
movne r3, #0
cmpeq lr, r1
bne .LBB1_4
.LBB1_2:
ldrex lr, [r0]
mov r4, #0
cmp lr, r1
bne .LBB1_1
mcr p15, #0, r4, c7, c10, #5
strex r5, r2, [r0]
cmp r5, #0
moveq r4, #1
mcreq p15, #0, r12, c7, c10, #5
b .LBB1_1
.LBB1_4:
mov r0, r3
pop {r4, r5, r11, pc}
>Fix:
Set SCTLR_EL1.CP15BEN=1 for compat_netbsd32 processes, and/or
create a sysctl knob to control it.
Home |
Main Index |
Thread Index |
Old Index