NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
lib/59784: dlopening and dlclosing libpthread is broken
>Number: 59784
>Category: lib
>Synopsis: dlopening and dlclosing libpthread is broken
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Nov 22 16:20:00 +0000 2025
>Originator: Taylor R Campbell
>Release: current, 11, 10, 9, ...
>Organization:
Locked and Unloaded LLC
>Environment:
>Description:
A program that dlopens (a library linked against) libpthread
and then dlcloses it can find itself in a pretty pickle with
mysterious symptoms like this:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000079bbe310cccc in ?? ()
#0 0x000079bbe310cccc in ?? ()
#1 0x000079bbe2e9847c in __deregister_frame_info_bases () from /usr/lib/libgcc_s.so.1
#2 0x000079bbe2e86365 in __do_global_dtors_aux () from /usr/lib/libgcc_s.so.1
#3 0x000079bbe311ac00 in ?? ()
#4 0x000079bbe2e99a79 in _fini () from /usr/lib/libgcc_s.so.1
#5 0x000079bbe3585120 in atexit_handler_stack () from /usr/lib/libc.so.12
#6 0x00007f7ff709fbe1 in _rtld_call_initfini_function (mask=0x7f7fff539130, func=0x79bbe2e99a70 <_fini>) at /home/riastradh/netbsd/11/src/libexec/ld.elf_so/rtld.c:152
#7 _rtld_call_fini_function (obj=0x79bbe2e9ddf0, mask=0x7f7fff539130, cur_objgen=4) at /home/riastradh/netbsd/11/src/libexec/ld.elf_so/rtld.c:167
#8 0x00007f7ff70a06a6 in _rtld_call_fini_functions (force=1, mask=0x7f7fff539130) at /home/riastradh/netbsd/11/src/libexec/ld.elf_so/rtld.c:213
#9 _rtld_exit () at /home/riastradh/netbsd/11/src/libexec/ld.elf_so/rtld.c:431
#10 0x000079bbe32c895f in __cxa_finalize (dso=dso@entry=0x0) at /home/riastradh/netbsd/11/src/lib/libc/stdlib/atexit.c:222
#11 0x000079bbe32c853b in exit (status=status@entry=0) at /home/riastradh/netbsd/11/src/lib/libc/stdlib/exit.c:60
#12 0x000079bbe3592b90 in pass (ctx=0x79bbe359e860 <Current>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/tc.c:337
#13 0x000079bbe35931d5 in atf_tc_run (tc=0x792168 <atfu_dlopen_tc>, resfile=<optimized out>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/tc.c:1041
#14 0x000079bbe359000e in atf_tp_run (tp=tp@entry=0x7f7fff5392c0, tcname=<optimized out>, resfile=<optimized out>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/tp.c:205
#15 0x000079bbe358fb95 in run_tc (exitcode=<synthetic pointer>, p=0x7f7fff5392e0, tp=0x7f7fff5392c0) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:510
#16 controlled_main (exitcode=<synthetic pointer>, add_tcs_hook=0x78fad8 <atfu_tp_add_tcs>, argv=<optimized out>, argc=<optimized out>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:580
#17 atf_tp_main (argc=<optimized out>, argv=<optimized out>, add_tcs_hook=add_tcs_hook@entry=0x78fad8 <atfu_tp_add_tcs>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:610
#18 0x000000000078fcb6 in main (argc=<optimized out>, argv=<optimized out>) at /home/riastradh/netbsd/11/src/tests/lib/libpthread/dlopen/t_dlopen.c:163
#19 0x000000000078f4eb in ___start (cleanup=<optimized out>, ps_strings=0x7f7fff539fe0) at /home/riastradh/netbsd/11/src/lib/csu/common/crt0-common.c:375
#20 0x00007f7ff70a68d0 in ?? () from /usr/libexec/ld.elf_so
#21 0x0000000000000005 in ?? ()
#22 0x00007f7fff539968 in ?? ()
#23 0x00007f7fff539971 in ?? ()
#24 0x00007f7fff53998b in ?? ()
#25 0x00007f7fff5399ae in ?? ()
#26 0x00007f7fff5399c9 in ?? ()
#27 0x0000000000000000 in ?? ()
Setting a breakpoint on __deregister_frame_info_bases and
single-stepping through it reveals that the crash is trying to
jump into code in libpthread.so that no longer exists, after
dlclose, in order to call __libc_mutex_lock via PLT. Why is it
trying to jump there?
What happened is:
1. The program dlopened (a library linked against) libpthread.
2. The program called pthread_mutex_lock -- or rather,
__libc_mutex_lock, renamed via #define in <pthread.h>.
3. The symbol __libc_mutex_lock has two definitions:
(a) A weak definition in libc.so -- the no-op thread stub.
(b) A strong definition in libpthread.so -- the real one.
Lazy binding of the symbol chooses the strong one, so the
entry for __libc_mutex_lock in the .got.plt is bound to
libpthread.so's definition, as shown by `info proc mappings'
and single-stepping in gdb:
(gdb) info proc mappings
...
0x7ee838cfb000 0x7ee838d03000 0x8000 0x7000 r-x CNPD /lib/libpthread.so.1.5
...
(gdb) display/i $pc
1: x/i $pc
=> 0x7ee838a8a402 <__deregister_frame_info_bases+4>: push %r12
(gdb) si
...
(gdb) si
0x00007ee838a8a477 in __deregister_frame_info_bases ()
from /usr/lib/libgcc_s.so.1
1: x/i $pc
=> 0x7ee838a8a477 <__deregister_frame_info_bases+121>:
call 0x7ee838a78150 <__libc_mutex_lock@plt>
(gdb) si
0x00007ee838a78150 in __libc_mutex_lock@plt () from /usr/lib/libgcc_s.so.1
1: x/i $pc
=> 0x7ee838a78150 <__libc_mutex_lock@plt>:
jmp *0x17f42(%rip) # 0x7ee838a90098 <__libc_mutex_lock%got.plt@localhost>
(gdb) x/xg $rip + 6 + 0x17f42
0x7ee838a90098 <__libc_mutex_lock%got.plt@localhost>: 0x00007ee838cfeccc
(gdb) si
pthread_mutex_lock (ptm=0x7ee838a90400 <object_mutex>)
at /home/riastradh/netbsd/11/src/lib/libpthread/pthread_mutex.c:204
1: x/i $pc
=> 0x7ee838cfeccc <pthread_mutex_lock>:
mov 0x92b5(%rip),%rax # 0x7ee838d07f88
Note that 0x7ee838cfeccc lies in the interval
[0x7ee838cfb000,0x7ee838d03000) where libpthread.so is
mapped.
4. dlclose unmapped everything in libpthread.so -- including the
pages of instructions that the .got.plt entry for
__libc_mutex_lock now points to, and dlclose has no
mechanism to _unbind_ this.
5. The next thing that tried to call __libc_mutex_lock jumped
into oblivion where libpthread.so used to be. In the test
case above, that happened to be in some mysterious code path
at program exit, but it could just as well have been, say,
one of the stdio(3) functions taking a FILE lock.
(gdb) si
0x00007ee838a8a477 in __deregister_frame_info_bases ()
from /usr/lib/libgcc_s.so.1
1: x/i $pc
=> 0x7ee838a8a477 <__deregister_frame_info_bases+121>:
call 0x7ee838a78150 <__libc_mutex_lock@plt>
(gdb) si
0x00007ee838a78150 in __libc_mutex_lock@plt () from /usr/lib/libgcc_s.so.1
1: x/i $pc
=> 0x7ee838a78150 <__libc_mutex_lock@plt>:
jmp *0x17f42(%rip) # 0x7ee838a90098 <__libc_mutex_lock%got.plt@localhost>
(gdb) si
0x00007ee838cfeccc in ?? ()
1: x/i $pc
=> 0x7ee838cfeccc: <error: Cannot access memory at address 0x7ee838cfeccc>
Why doesn't RTLD_LOCAL limit the scope of libpthread.so's
__libc_mutex_lock definition so only those .got.plt entries for
objects that dlclose is unloading will point to the
libpthread.so one, and any .got.plt entries for objects in the
global namespace will get the libc.so weak one?
=> Because the library that the test dlopens, which is linked
against libpthread.so, is _also_ linked against libgcc_s.so,
which is already marked with -Wl,-z,nodelete -- and
libgcc_s.so's .got.plt entry for __libc_mutex_lock is
resolved in the RTLD_LOCAL scope and bound to
libpthread.so's __libc_mutex_lock. If we remove libgcc_s.so
(by not using LIBISCXX=yes in the test library -- not sure
why we're using that anyway), the symptom goes away.
>How-To-Repeat:
cd /usr/tests/lib/libpthread/dlopen
atf-run | atf-report
Caveat: This no longer works as a test case for this particular
bug in HEAD, because __deregister_frame_info_bases has changed
to avoid taking a lock with __libc_mutex_lock. Need to
construct a test case that still works in HEAD in spite of
those changes.
>Fix:
Add to lib/libpthread/Makefile:
LDADD+= -Wl,-z,nodelete
This prevents rtld from actually unloading libpthread.
The same is probably needed for any library that provides
strong definitions of a symbol that is still used when the
library isn't loaded, via a weak definition from some other
source -- like __libc_mutex_lock.
It's a dark corner of ELF wizardry that we probably don't use
much outside of libpthread.so but I can't rule out the
possibility that someone has dabbled in such nefarious magic
elsewhere.
Home |
Main Index |
Thread Index |
Old Index