NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/56979: fork(2) fails to be signal safe



The following reply was made to PR lib/56979; it has been noted by GNATS.

From: Tom Lane <tgl%sss.pgh.pa.us@localhost>
To: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/56979: fork(2) fails to be signal safe
Date: Sun, 28 Aug 2022 13:03:46 -0400

 I wrote:
 > I'll report back in a week or two.
 
 Didn't take long to find out that there's still a problem.  With
 this patch, it gets past the fork() all right, but there's still
 a risk of the child process getting stuck on the RTLD lock later:
 
 #0  0xfdeede4c in ___lwp_park60 () from /usr/libexec/ld.elf_so
 #1  0xfdee3e08 in _rtld_exclusive_enter () from /usr/libexec/ld.elf_so
 #2  0xfdee59e4 in dlopen () from /usr/libexec/ld.elf_so
 #3  0x01e6a4c0 in internal_load_library (
     libname=3Dlibname@entry=3D0xfde3cbf8 "/home/tgl/testversion/lib/postgr=
 esql/libpqwalreceiver.so") at dfmgr.c:239
 #4  0x01e6b2a0 in load_file (
     filename=3Dfilename@entry=3D0x1fc643c "libpqwalreceiver", =
 
     restricted=3Drestricted@entry=3Dfalse) at dfmgr.c:156
 #5  0x01c6d640 in WalReceiverMain () at walreceiver.c:278
 #6  0x01c19ba0 in AuxiliaryProcessMain (
     auxtype=3Dauxtype@entry=3DWalReceiverProcess) at auxprocess.c:161
 #7  0x01c21778 in StartChildProcess (type=3DWalReceiverProcess)
     at postmaster.c:5412
 #8  0x01c23214 in MaybeStartWalReceiver () at postmaster.c:5577
 #9  MaybeStartWalReceiver () at postmaster.c:5570
 #10 sigusr1_handler (postgres_signal_arg=3D<optimized out>) at postmaster.=
 c:5229
 #11 <signal handler called>
 #12 0xfdee195c in _rtld_bind () from /usr/libexec/ld.elf_so
 #13 0xfdee1dc0 in _rtld_bind_secureplt_start () from /usr/libexec/ld.elf_s=
 o
 Backtrace stopped: frame did not save the PC
 
 Again, manual investigation shows that the _rtld_bind is trying
 to resolve the select(2) call in the postmaster's main loop, and it's
 plausible that this happened during our first arrival at that call.
 
 I think Taylor's fix may still be a good idea for pro-forma spec complianc=
 e,
 but I despair of getting to a reliable Postgres build this way.  (BTW,
 is the RTLD lock business new in v10?  I'm surprised that we've not heard
 field reports of Postgres getting stuck at startup on NetBSD.)
 
 What I'm wondering about now is whether there is a way to force resolution
 of that PLT entry, or even all of the program's PLT entries, before we
 enable signals.  If there are multiple select(2) calls in the same source
 file, will they share a PLT entry?  If so, I could arrange to run a dummy
 select() call somewhere early in startup.
 
 			regards, tom lane
 


Home | Main Index | Thread Index | Old Index