tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Thread-local storage issues arose again? Firefox frequently crashes on 10.0 aarch64



Hi,

As I mentioned in http://mail-index.netbsd.org/netbsd-users/2024/04/12/msg030915.html, Firefox tab processes crash very frequently on NetBSD/aarch64 10.0. Building it with PKG_OPTIONS.firefox+=debug-info revealed that when it crashes it segfaults at one of these two places non-deterministically:

third_party/rlbox/include/rlbox_noop_sandbox.hpp:

rlbox_noop_sandbox_thread_data* get_rlbox_noop_sandbox_thread_data();
#  define RLBOX_NOOP_SANDBOX_STATIC_VARIABLES()                                \
    thread_local rlbox::rlbox_noop_sandbox_thread_data                         \
      rlbox_noop_sandbox_thread_info{ 0, 0 };                                  \
    namespace rlbox {                                                          \
      rlbox_noop_sandbox_thread_data* get_rlbox_noop_sandbox_thread_data()     \
      {                                                                        \
        return &rlbox_noop_sandbox_thread_info;                                \
      }                                                                        \
    }                                                                          \
    static_assert(true, "Enforce semi-colon")
> ...
  template<typename T, typename T_Converted, typename... T_Args>
  auto impl_invoke_with_func_ptr(T_Converted* func_ptr, T_Args&&... params)
  {
#ifdef RLBOX_EMBEDDER_PROVIDES_TLS_STATIC_VARIABLES
    auto& thread_data = *get_rlbox_noop_sandbox_thread_data();
#endif
    auto old_sandbox = thread_data.sandbox; // <-- CRASHES HERE!
    thread_data.sandbox = this;
    auto on_exit =
      detail::make_scope_exit([&] { thread_data.sandbox = old_sandbox; });
    return (*func_ptr)(params...);
  }

media/libjpeg/simd/arm/aarch64/jsimd.c:

static THREAD_LOCAL unsigned int simd_support = ~0;
                                                 JSIMD_FASTST3 | JSIMD_FASTTBL;
> ...
LOCAL(void)
init_simd(void)
{
#ifndef NO_GETENV
  char env[2] = { 0 };
#endif
#if defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)
  int bufsize = 1024; /* an initial guess for the line buffer size limit */
#endif

  if (simd_support != ~0U) // <-- CRASHES HERE!
    return;

  simd_support = 0;

So both of these cases involve TLS, that is, tab processes segfault while attempting to access thread-local variables. At run-time these functions reside in libxul.so, which is dlopen'ed by the main process. I recall there were a few issues in TLS handling in the past but riastradh@ fixed them before we branched 10.0, right?

"readelf -r libxul.so" shows no R_AARCH64_TLS_TPR in its relocation table but only shows R_AARCH64_TLSDESC, so I believe these variables use local-dynamic model. I tried to create a minimal reproducer but it didn't crash:
https://gist.github.com/depressed-pho/b6894fdaef94a1b9aa5459b1a2f65590

So I speculated that there were some kind of limit in the size of TLS blocks that dlopen(3) could sanely handle, and libxul.so exceeded it. As I mentioned in the previous mail, I modified /usr/pkg/bin/firefox based on this speculation:

#!/bin/sh
LD_PRELOAD=/usr/pkg/lib/firefox/libxul.so /usr/pkg/lib/firefox/firefox "$@"

To my surprise this actually worked! Firefox hasn't crashed even once since this modification! Help, riastradh@, TLS is convoluted and I have nearly zero knowledge about this monstrosity!

Home | Main Index | Thread Index | Old Index