NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/56979: fork(2) fails to be signal safe
The following reply was made to PR lib/56979; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Tom Lane <tgl%sss.pgh.pa.us@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/56979: fork(2) fails to be signal safe
Date: Sun, 16 Oct 2022 00:24:38 +0000
> Date: Sun, 28 Aug 2022 13:03:46 -0400
> From: Tom Lane <tgl%sss.pgh.pa.us@localhost>
>=20
> Didn't take long to find out that there's still a problem. With
> this patch, it gets past the fork() all right, but there's still
> a risk of the child process getting stuck on the RTLD lock later:
>=20
> #0 0xfdeede4c in ___lwp_park60 () from /usr/libexec/ld.elf_so
> #1 0xfdee3e08 in _rtld_exclusive_enter () from /usr/libexec/ld.elf_so
> #2 0xfdee59e4 in dlopen () from /usr/libexec/ld.elf_so
> [...]
> #11 <signal handler called>
> #12 0xfdee195c in _rtld_bind () from /usr/libexec/ld.elf_so
> #13 0xfdee1dc0 in _rtld_bind_secureplt_start () from /usr/libexec/ld.elf_=
so
Do I understand correctly that this means you're trying to call dlopen
from a signal handler?
Although fork is documented (in NetBSD and in POSIX) as
async-signal-safe, dlopen is definitely not and never has been and
probably never will be -- and I would expect if it works anywhere,
it's by accident and not remotely guaranteed to be reliable.
I didn't follow exactly what you're doing, but I suspect it would be
much more reliable to have the signal handler set a flag or write a
flag to a pipe and cause select(2) to fail with EINTR and process the
flag -- then you can safely dlopen (and malloc and fopen and whatever
else) with wild abandon.
> (BTW, is the RTLD lock business new in v10? I'm surprised that
> we've not heard field reports of Postgres getting stuck at startup
> on NetBSD.)
Yes.
> What I'm wondering about now is whether there is a way to force resolution
> of that PLT entry, or even all of the program's PLT entries, before we
> enable signals. If there are multiple select(2) calls in the same source
> file, will they share a PLT entry? If so, I could arrange to run a dummy
> select() call somewhere early in startup.
Same source file, or even .so (or executable), probably shared;
different .so, not likely.
Home |
Main Index |
Thread Index |
Old Index