NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

lib/58154: aarch64: firefox-124.0.2 crashes very frequently on NetBSD/aarch64 10.0, likely related to thread-local storage



>Number:         58154
>Category:       lib
>Synopsis:       aarch64: firefox-124.0.2 crashes very frequently on NetBSD/aarch64 10.0, likely related to thread-local storage
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 15 08:35:00 +0000 2024
>Originator:     PHO
>Release:        10.0
>Organization:
TNF
>Environment:
NetBSD yukari.cielonegro.org 10.0 NetBSD 10.0 (GENERIC64) #0: Thu Mar 28 08:33:33 UTC 2024  mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm
>Description:
https://mail-index.netbsd.org/tech-userlevel/2024/04/14/msg014295.html

Tab processes of Firefox crash very frequently on NetBSD/aarch64 10.0. Building it with PKG_OPTIONS.firefox+=debug-info revealed that when it crashes it segfaults at one of these two places non-deterministically:

third_party/rlbox/include/rlbox_noop_sandbox.hpp:

> rlbox_noop_sandbox_thread_data* get_rlbox_noop_sandbox_thread_data();
> #  define RLBOX_NOOP_SANDBOX_STATIC_VARIABLES()                                \
>     thread_local rlbox::rlbox_noop_sandbox_thread_data                         \
>       rlbox_noop_sandbox_thread_info{ 0, 0 };                                  \
>     namespace rlbox {                                                          \
>       rlbox_noop_sandbox_thread_data* get_rlbox_noop_sandbox_thread_data()     \
>       {                                                                        \
>         return &rlbox_noop_sandbox_thread_info;                                \
>       }                                                                        \
>     }                                                                          \
>     static_assert(true, "Enforce semi-colon")
> ...
>   template<typename T, typename T_Converted, typename... T_Args>
>   auto impl_invoke_with_func_ptr(T_Converted* func_ptr, T_Args&&... params)
>   {
> #ifdef RLBOX_EMBEDDER_PROVIDES_TLS_STATIC_VARIABLES
>     auto& thread_data = *get_rlbox_noop_sandbox_thread_data();
> #endif
>     auto old_sandbox = thread_data.sandbox; // <-- CRASHES HERE!
>     thread_data.sandbox = this;
>     auto on_exit =
>       detail::make_scope_exit([&] { thread_data.sandbox = old_sandbox; });
>     return (*func_ptr)(params...);
>   }

media/libjpeg/simd/arm/aarch64/jsimd.c:

> static THREAD_LOCAL unsigned int simd_support = ~0;
> ...
> LOCAL(void)
> init_simd(void)
> {
> #ifndef NO_GETENV
>   char env[2] = { 0 };
> #endif
> #if defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)
>   int bufsize = 1024; /* an initial guess for the line buffer size limit */
> #endif
>
>   if (simd_support != ~0U) // <-- CRASHES HERE!
>     return;
>
>   simd_support = 0;

So both of these cases involve TLS, that is, tab processes segfault while attempting to access thread-local variables. At run-time these functions reside in libxul.so, which is dlopen'ed by the main process. I recall there were a few issues in TLS handling in the past but riastradh@ fixed them before we branched 10.0, right?

"readelf -r libxul.so" shows no R_AARCH64_TLS_TPR in its relocation table but only shows R_AARCH64_TLSDESC, so I believe these variables use local-dynamic model. I tried to create a minimal reproducer but it didn't crash:
https://gist.github.com/depressed-pho/b6894fdaef94a1b9aa5459b1a2f65590

This is the preprocessor output for rlbox_noop_sandbox.hpp, which is instantiated by config/external/rlbox/rlbox_thread_locals.cpp. The thread_local isn't replaced so it's the C++ reserved word:

> thread_local rlbox::rlbox_noop_sandbox_thread_data rlbox_noop_sandbox_thread_info{ 0, 0 }; namespace rlbox { rlbox_noop_sandbox_thread_data* get_rlbox_noop_sandbox_thread_data() { return &rlbox_noop_sandbox_thread_info; } } static_assert(true, "Enforce semi-colon");

This is the command for compiling rlbox_thread_locals.cpp. It's compiled with -fPIC with no explicit -ftls-model, so it's defaulted to global-dynamic:

> /usr/pkgsrc/www/firefox/work/.cwrapper/bin/c++ -std=gnu++17 -o rlbox_thread_locals.o -c  -I/usr/pkgsrc/www/firefox/work/build/dist/stl_wrappers -I/usr/pkgsrc/www/firefox/work/build/dist/system_wrappers -include /usr/pkgsrc/www/firefox/work/firefox-124.0.2/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DBUILDING_SOUNDTOUCH=1 -DST_NO_EXCEPTION_HANDLING=1 -DMOZ_HAS_MOZGLUE -I/usr/pkgsrc/www/firefox/work/firefox-124.0.2/media/libsoundtouch/src -I/usr/pkgsrc/www/firefox/work/build/media/libsoundtouch/src -I/usr/pkgsrc/www/firefox/work/build/security/rlbox -I/usr/pkgsrc/www/firefox/work/firefox-124.0.2/third_party/simde -I/usr/pkgsrc/www/firefox/work/firefox-124.0.2/third_party/wasm2c/wasm2c -I/usr/pkgsrc/www/firefox/work/build/dist/include -I/usr/pkg/include/nspr -I/usr/pkg/include/nss -I/usr/pkg/include/nspr -I/usr/pkgsrc/www/firefox/work/build/dist/include/nss -I/usr/pkg/include/pixman-1 -I/usr/pkg/include -DMOZILLA_CLIENT -i
 nclude /usr/pkgsrc/www/firefox/work/build/mozilla-config.h -DPNG_NO_ASSEMBLER_CODE -I/usr/include -I/usr/pkg/include -I/usr/pkg/include/nspr -I/usr/pkg/include/libdrm -I/usr/pkg/include/glib-2.0 -I/usr/pkg/include/gio-unix-2.0 -I/usr/pkg/lib/glib-2.0/include -I/usr/pkg/include/ffmpeg6 -I/usr/pkg/include/freetype2 -I/usr/pkg/include/harfbuzz -fno-sized-deallocation -fno-aligned-new -O2 -D_GLIBCXX_INCLUDE_NEXT_C_HEADERS -D__LOCALE_C_ONLY -I/usr/include -I/usr/pkg/include -I/usr/pkg/include/nspr -I/usr/pkg/include/libdrm -I/usr/pkg/include/glib-2.0 -I/usr/pkg/include/gio-unix-2.0 -I/usr/pkg/lib/glib-2.0/include -I/usr/pkg/include/ffmpeg6 -I/usr/pkg/include/freetype2 -I/usr/pkg/include/harfbuzz -fno-exceptions -Dunix -fPIC -DPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -gdwarf-4 -O2 -fomit-frame-pointer -funwind-tables -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wno-invalid-offseto
 f -Wc++2a-compat -Wcomma-subscript -Wvolatile -Wno-error=deprecated -Wduplicated-cond -Wimplicit-fallthrough -Wlogical-op -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=free-nonheap-object -Wno-multistatement-macros -Wno-error=class-memaccess -Wformat -Wformat-overflow=2 -Wno-psabi -Wno-error=builtin-macro-redefined -include /usr/pkgsrc/www/firefox/work/firefox-124.0.2/media/libsoundtouch/src/soundtouch_perms.h -fno-strict-aliasing -ffp-contract=off  -MD -MP -MF .deps/rlbox_thread_locals.o.pp   /usr/pkgsrc/www/firefox/work/firefox-124.0.2/config/external/rlbox/rlbox_thread_locals.cpp

And this is the preprocessor output for jsimd.c. The THREAD_LOCAL is defined as __thread:

> # 33 "/usr/pkgsrc/www/firefox/work/firefox-124.0.2/media/libjpeg/simd/arm/aarch64/jsimd.c"
> static __thread unsigned int simd_support = ~0;

This is the command for compiling jsimd.c. It's compiled with -fPIC with no explicit -ftls-model, so it's also defaulted to global-dynamic:

> /usr/pkgsrc/www/firefox/work/.cwrapper/bin/cc -std=gnu99 -o jsimd.o -c  -I/usr/pkgsrc/www/firefox/work/build/dist/system_wrappers -include /usr/pkgsrc/www/firefox/work/firefox-124.0.2/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DHAVE_VLD1_S16_X3 -DHAVE_VLD1_U16_X2 -DHAVE_VLD1Q_U8_X4 -DNEON_INTRINSICS -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/usr/pkgsrc/www/firefox/work/firefox-124.0.2/media/libjpeg -I/usr/pkgsrc/www/firefox/work/build/media/libjpeg -I/usr/pkgsrc/www/firefox/work/firefox-124.0.2/media/libjpeg/simd/arm -I/usr/pkgsrc/www/firefox/work/firefox-124.0.2/media/libjpeg/simd/arm/aarch64 -I/usr/pkgsrc/www/firefox/work/build/dist/include -I/usr/pkg/include/nspr -I/usr/pkg/include/nss -I/usr/pkg/include/nspr -I/usr/pkgsrc/www/firefox/work/build/dist/include/nss -I/usr/pkg/include/pixman-1 -I/usr/pkg/include -include /usr/pkgsrc/www/firefox/work/build/mozilla-config.h -DMOZI
 LLA_CLIENT -DPNG_NO_ASSEMBLER_CODE -I/usr/include -I/usr/pkg/include -I/usr/pkg/include/nspr -I/usr/pkg/include/libdrm -I/usr/pkg/include/glib-2.0 -I/usr/pkg/include/gio-unix-2.0 -I/usr/pkg/lib/glib-2.0/include -I/usr/pkg/include/ffmpeg6 -I/usr/pkg/include/freetype2 -I/usr/pkg/include/harfbuzz -O2 -D_GLIBCXX_INCLUDE_NEXT_C_HEADERS -D__LOCALE_C_ONLY -I/usr/include -I/usr/pkg/include -I/usr/pkg/include/nspr -I/usr/pkg/include/libdrm -I/usr/pkg/include/glib-2.0 -I/usr/pkg/include/gio-unix-2.0 -I/usr/pkg/lib/glib-2.0/include -I/usr/pkg/include/ffmpeg6 -I/usr/pkg/include/freetype2 -I/usr/pkg/include/harfbuzz -Dunix -fPIC -DPIC -ffunction-sections -fdata-sections -fno-math-errno -pthread -gdwarf-4 -O2 -fomit-frame-pointer -funwind-tables -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wduplicated-cond -Wlogical-op -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=free-nonheap-object
  -Wno-multistatement-macros -Wno-error=class-memaccess -Wformat -Wformat-overflow=2 -Werror=implicit-function-declaration -Wno-psabi -Wno-error=builtin-macro-redefined -fno-strict-aliasing -ffp-contract=off  -MD -MP -MF .deps/jsimd.o.pp   /usr/pkgsrc/www/firefox/work/firefox-124.0.2/media/libjpeg/simd/arm/aarch64/jsimd.c

I put my set of binary packages for Firefox and its transitive dependencies here. Since it's huge I will probably delete these when the cause of the problem is tracked down and the PR is closed:
https://akari.cielonegro.org/tmp/firefox-bin-pkgs/

This is one of coredumps I got:
https://akari.cielonegro.org/tmp/firefox-bin-pkgs/firefox.core.gz

And this is the output of "LD_DEBUG=1 firefox". I launched firefox, created a new profile, and ran it until a tab crashed. I couldn't find anything obviously wrong though:
https://akari.cielonegro.org/tmp/firefox-bin-pkgs/firefox.ld-debug.log

These are libraries that Firefox dlopen'ed, either directly or indirectly:
> dlopen of (null) 0x1
> dlopen of (null) 0x101
> dlopen of /usr/lib/i18n/libEUC.so.5.0 0x1
> dlopen of /usr/lib/i18n/libUTF8.so.5.0 0x1
> dlopen of /usr/lib/i18n/libiconv_std.so.5.0 0x1
> dlopen of /usr/lib/i18n/libmapper_646.so.5.0 0x1
> dlopen of /usr/lib/i18n/libmapper_parallel.so.5.0 0x1
> dlopen of /usr/lib/i18n/libmapper_serial.so.5.0 0x1
> dlopen of /usr/lib/i18n/libmapper_std.so.5.0 0x1
> dlopen of /usr/lib/i18n/libmapper_zone.so.5.0 0x1
> dlopen of /usr/pkg/lib/dri/swrast_dri.so 0x102
> dlopen of /usr/pkg/lib/firefox/libgkcodecs.so 0x101
> dlopen of /usr/pkg/lib/firefox/libipcclientcerts.so 0x1
> dlopen of /usr/pkg/lib/firefox/liblgpllibs.so 0x101
> dlopen of /usr/pkg/lib/firefox/libmozgtk.so 0x101
> dlopen of /usr/pkg/lib/firefox/libmozsqlite3.so 0x101
> dlopen of /usr/pkg/lib/firefox/libmozwayland.so 0x101
> dlopen of /usr/pkg/lib/firefox/libosclientcerts.so 0x1
> dlopen of /usr/pkg/lib/firefox/libxul.so 0x101
> dlopen of /usr/pkg/lib/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-svg.so 0x1
> dlopen of /usr/pkg/lib/gio/modules/libdconfsettings.so 0x1
> dlopen of /usr/pkg/lib/gtk-3.0/3.0.0/immodules/im-uim.so 0x1
> dlopen of /usr/pkg/lib/nss/libfreebl3.so 0x202
> dlopen of /usr/pkg/lib/nss/libnssckbi.so 0x1
> dlopen of /usr/pkg/lib/nss/libsoftokn3.so 0x202
> dlopen of /usr/pkg/lib/uim/plugin/libuim-skk.so 0x2
> dlopen of /usr/pkg/lib/uim/plugin/libuim-sqlite3.so 0x2
> dlopen of /usr/pkg/lib/uim/plugin/libuim-xkb.so 0x2
> dlopen of libEGL.so 0x1
> dlopen of libGL.so 0x1
> dlopen of libGL.so.1 0x102
> dlopen of libXcursor.so.1 0x1
> dlopen of libXss.so.1 0x1
> dlopen of libdrm.so.2 0x201
> dlopen of libgbm.so.1 0x201
> dlopen of libgio-2.0.so.0 0x1
> dlopen of libpulse.so.0 0x1
> dlopen of nss_compat.so.0 0x201
> dlopen of nss_dns.so.0 0x201
> dlopen of nss_files.so.0 0x201
> dlopen of nss_multicast_dns.so.0 0x201
> dlopen of nss_nis.so.0 0x201

>How-To-Repeat:
Launch Firefox, open any website, and see the tab crashes. Fore the rare occasion it doesn't crash. In that case restart Firefox and try again.
>Fix:
So I speculated that there were some kind of limit in the size of TLS blocks that dlopen(3) could sanely handle, and libxul.so exceeded it. I modified /usr/pkg/bin/firefox based on this speculation:

> #!/bin/sh
> LD_PRELOAD=/usr/pkg/lib/firefox/libxul.so /usr/pkg/lib/firefox/firefox "$@"

To my surprise this actually worked! Firefox hasn't crashed even once since this modification! But of course this is only a workaround. We haven't found the root cause.



Home | Main Index | Thread Index | Old Index