NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

lib/60215: ld.elf_so dlerror() state is not thread-local



>Number:         60215
>Category:       lib
>Synopsis:       ld.elf_so dlerror() state is not thread-local
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Apr 28 19:50:00 +0000 2026
>Originator:     Izumi Tsutsui
>Release:        NetBSD 11.0_RC3
>Organization:
>Environment:
System: NetBSD mirage 11.0_RC3 NetBSD 11.0_RC3 (GENERIC) #22: Sat Apr 25 03:39:02 JST 2026 tsutsui@mirage:/s/netbsd-11/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
On NetBSD/i386 10.1 and 11.0_RC3, pkgsrc/net/mikutter *sometimes* fails to
start with the following Ruby-GNOME/gobject-introspection error:

```
  GLib::Error Could not locate gtk_events_pending: 'gtk_events_pending':
  /usr/pkg/bin/ruby33: Shared object "nss_files.so.0" not found
```

This error is intermittent.  Running mikutter repeatedly eventually starts
successfully.

ktrace shows that another LWP in the same process reads /etc/nsswitch.conf
and probes NSS modules such as:

```
  /usr/pkg/lib/nss_compat.so.0
  /usr/lib/nss_compat.so.0
  /usr/pkg/lib/nss_nis.so.0
  /usr/lib/nss_nis.so.0
  /usr/pkg/lib/nss_files.so.0
  /usr/lib/nss_files.so.0
  /usr/pkg/lib/nss_dns.so.0
  /usr/lib/nss_dns.so.0
```

These probes fail because these shared objects do not exist.  That failure
is not fatal by itself, because the corresponding NSS backends are provided
by libc on NetBSD.

However, the failure message from the NSS module probe appears to be observed
by an unrelated symbol lookup in another thread.  In this case, GLib/GModule
or gobject-introspection is trying to resolve gtk_events_pending, but receives
the stale or clobbered dlerror() message:

```
  Shared object "nss_files.so.0" not found
```

The gtk_events_pending symbol itself is present in libgtk-3.  The reported
symbol lookup failure therefore appears to be caused by dlerror() state being
shared between threads.

Looking at libexec/ld.elf_so/rtld.c, dlerror() appears to use a single
process-global static variable and dlerror() returns and clears this
global variable:

```
/*
 * Data declarations.
 */
static char    *error_message;	/* Message for dlopen(), or NULL */

```

```
__strong_alias(__dlerror,dlerror)
char *
dlerror(void)
{
	char *msg = error_message;

	error_message = NULL;
	return msg;
}
```

This means that one thread can clear or observe an error produced by another
thread's dlopen()/dlsym() operation.  In a multi-threaded program, this can
make usual sequence as the following unreliable:

```
  dlerror();              /* clear old error */
  sym = dlsym(handle, name);
  error = dlerror();      /* check this lookup */
```

because another thread can call dlopen(), dlsym(), or dlerror()
between these operations.

This seems to explain the intermittent mikutter failure. If the NSS dlopen()
probe failure happens between GLib/GModule's dlerror-clear and dlerror-check
around dlsym("gtk_events_pending"), the unrelated NSS error is treated as a
symbol lookup failure.


>How-To-Repeat:
One reproducer is pkgsrc/net/mikutter on NetBSD/i386 10.1 or 11.0_RC3 with
an /etc/nsswitch.conf that uses compat/nis/files/dns entries, for example:

```
  group:          compat
  group_compat:   nis
  hosts:          files dns
  netgroup:       nis files
  networks:       files nis
  passwd:         compat
  passwd_compat:  nis
  shells:         files
```

Run:

  ktrace -di -f /tmp/mikutter.ktrace mikutter

When it fails, the error dialog reports:

  GLib::Error Could not locate gtk_events_pending: 'gtk_events_pending':
  /usr/pkg/bin/ruby33: Shared object "nss_files.so.0" not found

and kdump shows NSS module probes such as nss_files.so.0, nss_dns.so.0,
nss_compat.so.0, and nss_nis.so.0 from another LWP in the same process.

The failure is intermittent.  Re-running the same command may succeed,
which is consistent with a race on process-global dlerror() state.

>Fix:
If my understanding is correct, error_message in libexec/ld.elf_so/rtld.c
needs to be per-thread rather than process-global.  I would appreciate
comments from people familiar with rtld and libpthread interactions.

---
Izumi Tsutsui




Home | Main Index | Thread Index | Old Index