NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-arm/55704: multi-threaded applications for earmv[45]{,hf} freeze on COMPAT_NETBSD32 of aarch64



>Number:         55704
>Category:       port-arm
>Synopsis:       multi-threaded applications for earmv[45]{,hf} freeze on COMPAT_NETBSD32 of aarch64
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-arm-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Oct 08 09:55:00 +0000 2020
>Originator:     Rin Okuyama
>Release:        9.99.73
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD rpi 9.99.73 NetBSD 9.99.73 (GENERIC64) #38: Wed Oct  7 17:28:52 JST 2020  rin@latipes:/sys/arch/evbarm/compile/GENERIC64 evbarm aarch64
>Description:
Multi-threaded applications on userland for earmv[45]{,hf} freeze
indefinitely on COMPAT_NETBSD32 of aarch64, if more than one CPU
cores are online. For example, ctfmerge(1) freezes almost every time
during build of pkgsrc/pkgtools/cwrappers:

----
# uname -p
aarch64
# file /emul/netbsd32/bin/sh
/bin/sh: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /libexec/ld.elf_so, for NetBSD 9.99.73, compiled for: earmv5, not stripped
# chroot /emul/netbsd32 su -
# cd /usr/pkgsrc/pkgtools/cwrappers && make MAKE_JOBS=1
...
ctfmerge -t -g -L VERSION -o c++-wrapper alloc.o cleanup-cc.o common.o reorder-cc.o generic-transform-cc.o normalise-cc.o c++-wrapper.o transform-cc.o
(then stalls here eternally)
----

GDB shows that it is sleeping in lwp_park(2):

----
# fg
make MAKE_JOBS=1
^Z[1] + Suspended               make MAKE_JOBS=1
# bg
[1] make MAKE_JOBS=1
# gdb -p `pgrep ctfmerge`
...
Thread 1 "" received signal SIGCONT, Continued.
[Switching to LWP 3419 of process 3245]
0xf3a3c4c4 in ___lwp_park60 () from /usr/libexec/ld.elf_so
(gdb) bt
#0  0xf3a3c4c4 in ___lwp_park60 () from /usr/libexec/ld.elf_so
#1  0xf3a31e6c in _rtld_exclusive_enter (mask=mask@entry=0xf73fff90)
    at /usr/src/libexec/ld.elf_so/rtld.c:1766
#2  0xf3a39e60 in _rtld_tls_get_addr (tls=0xf796f000, idx=2, offset=0)
    at /usr/src/libexec/ld.elf_so/tls.c:68
#3  0xf7ac9e48 in __cxa_thread_run_atexit ()
    at /usr/src/lib/libc/stdlib/cxa_thread_atexit.c:55
#4  0xf7c1bc1c in pthread_exit (retval=0x0)
    at /usr/src/lib/libpthread/pthread.c:629
#5  0xf7c1bd18 in pthread__create_tramp (cookie=0xf7b79000)
    at /usr/src/lib/libpthread/pthread.c:562
#6  0xf7af99f4 in __mknod50 () from /usr/lib/libc.so.12
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
----

If only one CPU core is online by cpuctl(8), ctfmerge(1) works without
problems for COMPAT_NETBSD32. This strongly suggests that there may be
some problems for earmv[45]{,hf} userland on multi-processor machines.
>How-To-Repeat:
Described above.
>Fix:
I'm not sure whether we can fix this problem without modifying userland
binaries for earmv[45]{,hf}. While arm variants prior to v6 realize
atomic_ops(3) by swp instruction (we emulate it for COMPAT_NETBSD32),
they does not have membar_ops(3), since they are not intended for
multi-processor machines. Actually, you can see our membar_ops(3) are
no-op for arm processors prior to v6:

https://nxr.netbsd.org/xref/src/common/lib/libc/arch/arm/atomic/membar_ops.S#33



Home | Main Index | Thread Index | Old Index