NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-mips/59064 (jemalloc switch to 5.3 broke userland)



> Date: Sun, 13 Apr 2025 13:39:54 +0900
> From: Rin Okuyama <rokuyama.rk%gmail.com@localhost>
> 
> On 2025/04/12 23:51, Taylor R Campbell wrote:
> > Can you try the attached patch?  Will require a clean build of
> > anything that uses bsd.lib.mk.  (Will also need something to wash the
> > embarrassment off my face if this turns out to be the culprit!)
> 
> Thank you very much for finding it out!

The clue that tipped me off was the t_tls_static test failure in the
pmax releng testbed, which started after a few changes to bsd.*.mk and
to make(1):

https://releng.NetBSD.org/b5reports/pmax/commits-2025.01.html#build-2025.01.14.16.46.38

> Statically-linked binaries (specifically, /rescue/*) on n{64,32}
> userland on ERLite-3 work just fine on ERLite-3, if "initial-exec"
> attribute is removed at the same time.
> 
> Also, libc/tls and ld.elf_so tests becomes working again
> (except for t_rtld_r_debug).

Nice!  I added some extra diagnostics to t_rtld_r_debug -- maybe they
will help to figure out what's going on.

> I forgot to mention, but userland works even with "initial-exec"
> TLS model on QEMU and GXemul for mips somehow. Emulation may be
> not precise enough, or our TLS handling relays on some undefined
> H/W behaviors?

Is this for emulating the  RDHWR $3,$29  instruction, 0x7c03e83b?

There's a funny comment in sys/arch/mips/include/lwp_private.h (which
was originally added by matt@ to sys/arch/mips/include/mcontext.h
rev. 1.21 back in 2015):

     57 		// For some reason the syscall is much faster than
     58 		// emulating rdhwr $3,$29 on a CN50xx

https://nxr.NetBSD.org/xref/src/sys/arch/mips/include/lwp_private.h?r=1.1#57

I wonder if that's related -- gcc emits the RDHWR instruction itself,
rather than going through the __lwp_gettcb_fast function.

> By examining `VMFAULT_TRACE` codes of mips/trap.c, __BIT(40) is
> turned on for fault addresses, e.g., 0x1fff0a25050 (for most cases?).
> This is odd as our user address space is only 40-bit for mips64.
> 
> I've not figured out what is going on for ERLite-3...

Curious...  Is it different on other MIPS?  Does the other information
in the print confirm that this is supposed to be a user address?  Can
you find what userland was doing to provoke this?

> PS
> Also, your patch fixes recent ATF regressions for arm:
> 
> - lib/libc/tls/t_tls_static:t_tls_static
> - usr.bin/c++/t_cxxruntime:cxxruntime_static
> - usr.bin/c++/t_static_destructor:static_destructor_static
> 
> I've just noticed that these tests abort by calling libc stub of
> _tls_get_addr(), in a similar manner to mips.

Excellent!  I have committed the fix (and added a note to UPDATING
that libraries require a clean build).


Home | Main Index | Thread Index | Old Index