NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-arm/59622: aarch64: compat32 armv6 dmb emulation can't satisfy llvm atomic r/m/w codegen



>Number:         59622
>Category:       port-arm
>Synopsis:       aarch64: compat32 armv6 dmb emulation can't satisfy llvm atomic r/m/w codegen
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-arm-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Aug 31 18:45:00 +0000 2025
>Originator:     Taylor R Campbell
>Release:        current, 11, 10, 9, ...
>Organization:
The NetBSDv6 Broke Reservation
>Environment:
aarch64 running armv6 binaries under compat_netbsd32
>Description:

	The llvm implementation of atomic_compare_exchange with
	memory_order_release or memory_order_acq_rel on armv6 target
	yields the sequence:

		ldrex	lr, [r0]
		mov	r4, #0
		cmp	lr, r1
		bne	.LBB0_1
		mcr	p15, #0, r12, c7, c10, #5
		strex	r4, r2, [r0]

	https://godbolt.org/z/KWME698f7

	Unfortunately, on NetBSD/aarch64 under compat_netbsd32, this
	leads to an infinite loop.

	Why?

	The curious instruction

		mcr	p15, #0, r12, c7, c10, #5

	is an armv6ism [ARMV6-ARM, Sec. B2.6.1 `DataMemoryBarrier (DMB)
	CP15 register 7', p. B2-18 (pdf page 674)] that has the effect
	of what we now call `dmb sy' in modern Arm -- but with a
	different encoding, sometimes called CP15DMB, which was
	deprecated in armv7 [ARMV7AR-ARM, Sec. B4.2.5 `Data and
	instruction barrier operations, VMSA', p. B4-1744].

	In armv8 and armv9, there is a system control register bit
	SCTLR_EL1.CP15BEN (bit 5) that enables or disables the CP15
	barrier instructions at EL0 and EL1 [ARMV8.5-ARM,
	Sec. D13.2.113 `SCTLR_EL1, System Control Register (EL1)',
	p. D13-3405] [ARMV9-ARM, Sec. D19.2.124 `SCTLR_EL1, System
	Control Register (EL1)', p. D19-6989].  If disabled, the
	instruction traps to the supervisor.

	We leave SCTLR_EL1.CP15BEN off, and instead emulate the
	instruction in the kernel:

    798 	case 0x0e070fba:
    799 		if (arm_cond_match(insn, tf->tf_spsr)) {
    800 			/*
    801 			 * mcr p15, 0, <Rd>, c7, c10, 5
    802 			 * (data memory barrier)
    803 			 */
    804 			dmb(sy);
    805 		}
    806 		goto emulated;

	https://nxr.NetBSD.org/xref/src/sys/arch/aarch64/aarch64/trap.c?r=1.53#798

	But this doesn't work inside a ldrex/strex loop, because the
	trap to kernel breaks the reservation, so the loop never makes
	progress.

	On the one hand, I don't think the armv6 architecture ever
	guaranteed progress in a ldrex/cp15dmb/strex loop in the first
	place, so I think we are justified in calling this an llvm bug.
	In fact there was already an upstream bug report, to which I
	added some analysis and references when we discovered this back
	in May:

	https://web.archive.org/web/20250831183355/https://github.com/llvm/llvm-project/issues/41201

	It's not just me -- Will Deacon (formerly(?) of Arm) agreed
	with my assessment that this is an llvm bug, as reported in a
	corresponding Rust issue:

	https://web.archive.org/web/20250325040643/https://github.com/rust-lang/rust/issues/60605

	On the one hand, this is how llvm actually generates code today
	and it has gotten in the way of rust on armv6 on NetBSD:

	https://mail-index.NetBSD.org/tech-pkg/2025/05/11/msg031145.html

	So I think we would _also_ be justified in setting
	SCTLR_EL1.CP15BEN=1, or at least creating a sysctl knob for it
	like Linux has at abi.cp15_barrier [LINUXARMLEGACYINSTR].


	References:

	[ARMV6-ARM] ARM Architecture Reference Manual, DDI 0100I, 2005
	https://developer.arm.com/documentation/ddi0100/i/

	[ARMV7AR-ARM] ARM Architecture Reference Manual: ARMv7-A and
	ARMv7-R edition, DDI 0406C.d (ID040418), 2018
	https://developer.arm.com/documentation/ddi0406/cd/

	[ARMV8.5A-ARM] Arm Architecture Reference Manual: Armv8, for
	Armv8-A architecture profile, DDI 0487F.b (ID040120), 2020
	https://developer.arm.com/documentation/ddi0487/fb/

	[ARMV9A-ARM] Arm Architecture Reference Manual: for A-profile
	architecture, DDI 0487J.a (ID042523), 2023
	https://developer.arm.com/documentation/ddi0487/ja/

	[LINUXARMLEGACYINSTR] Supported legacy instructions, Linux
	arm64 documentation, 2017
	https://web.archive.org/web/20250325082321/https://www.kernel.org/doc/Documentation/arm64/legacy_instructions.txt


>How-To-Repeat:

	Compile a program using atomic_* for armv6 with llvm, e.g. with
	the options: -O2 -Wall -Werror -march=armv6

	https://godbolt.org/z/KWME698f7

	Program:

#include <stdatomic.h>

int
cas_release(atomic_int *p, int o, int n)
{

	for (;;) {
		int e = o;

		if (atomic_compare_exchange_weak_explicit(p, &e, n,
			memory_order_release, memory_order_relaxed))
			return 0;
		if (e != o)
			return -1;
	}
}

int
cas_acq_rel(atomic_int *p, int o, int n)
{

	for (;;) {
		int e = o;

		if (atomic_compare_exchange_weak_explicit(p, &e, n,
			memory_order_acq_rel, memory_order_relaxed))
			return 0;
		if (e != o)
			return -1;
	}
}

	Assembly:

cas_release:
        push    {r4, lr}
        mov     r12, #0
        b       .LBB0_2
.LBB0_1:
        cmp     lr, r1
        mvnne   r3, #0
        tst     r4, #1
        movne   r3, #0
        cmpeq   lr, r1
        bne     .LBB0_4
.LBB0_2:
        ldrex   lr, [r0]
        mov     r4, #0
        cmp     lr, r1
        bne     .LBB0_1
        mcr     p15, #0, r12, c7, c10, #5
        strex   r4, r2, [r0]
        cmp     r4, #0
        mov     r4, #0
        mvneq   r4, #0
        b       .LBB0_1
.LBB0_4:
        mov     r0, r3
        pop     {r4, pc}

cas_acq_rel:
        push    {r4, r5, r11, lr}
        mov     r12, #0
        b       .LBB1_2
.LBB1_1:
        cmp     lr, r1
        mvnne   r3, #0
        cmp     r4, #0
        movne   r3, #0
        cmpeq   lr, r1
        bne     .LBB1_4
.LBB1_2:
        ldrex   lr, [r0]
        mov     r4, #0
        cmp     lr, r1
        bne     .LBB1_1
        mcr     p15, #0, r4, c7, c10, #5
        strex   r5, r2, [r0]
        cmp     r5, #0
        moveq   r4, #1
        mcreq   p15, #0, r12, c7, c10, #5
        b       .LBB1_1
.LBB1_4:
        mov     r0, r3
        pop     {r4, r5, r11, pc}


>Fix:

	Set SCTLR_EL1.CP15BEN=1 for compat_netbsd32 processes, and/or
	create a sysctl knob to control it.




Home | Main Index | Thread Index | Old Index