NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/56979: fork(2) fails to be signal safe



The following reply was made to PR lib/56979; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Tom Lane <tgl%sss.pgh.pa.us@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/56979: fork(2) fails to be signal safe
Date: Sun, 16 Oct 2022 00:24:38 +0000

 > Date: Sun, 28 Aug 2022 13:03:46 -0400
 > From: Tom Lane <tgl%sss.pgh.pa.us@localhost>
 >=20
 > Didn't take long to find out that there's still a problem.  With
 > this patch, it gets past the fork() all right, but there's still
 > a risk of the child process getting stuck on the RTLD lock later:
 >=20
 > #0  0xfdeede4c in ___lwp_park60 () from /usr/libexec/ld.elf_so
 > #1  0xfdee3e08 in _rtld_exclusive_enter () from /usr/libexec/ld.elf_so
 > #2  0xfdee59e4 in dlopen () from /usr/libexec/ld.elf_so
 > [...]
 > #11 <signal handler called>
 > #12 0xfdee195c in _rtld_bind () from /usr/libexec/ld.elf_so
 > #13 0xfdee1dc0 in _rtld_bind_secureplt_start () from /usr/libexec/ld.elf_=
 so
 
 Do I understand correctly that this means you're trying to call dlopen
 from a signal handler?
 
 Although fork is documented (in NetBSD and in POSIX) as
 async-signal-safe, dlopen is definitely not and never has been and
 probably never will be -- and I would expect if it works anywhere,
 it's by accident and not remotely guaranteed to be reliable.
 
 I didn't follow exactly what you're doing, but I suspect it would be
 much more reliable to have the signal handler set a flag or write a
 flag to a pipe and cause select(2) to fail with EINTR and process the
 flag -- then you can safely dlopen (and malloc and fopen and whatever
 else) with wild abandon.
 
 > (BTW, is the RTLD lock business new in v10?  I'm surprised that
 > we've not heard field reports of Postgres getting stuck at startup
 > on NetBSD.)
 
 Yes.
 
 > What I'm wondering about now is whether there is a way to force resolution
 > of that PLT entry, or even all of the program's PLT entries, before we
 > enable signals.  If there are multiple select(2) calls in the same source
 > file, will they share a PLT entry?  If so, I could arrange to run a dummy
 > select() call somewhere early in startup.
 
 Same source file, or even .so (or executable), probably shared;
 different .so, not likely.
 


Home | Main Index | Thread Index | Old Index