NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/60275: sh(1): race condition in signal handling on background subshell fork



The following reply was made to PR bin/60275; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: gnats-bugs%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc: 
Subject: Re: bin/60275: sh(1): race condition in signal handling on background subshell fork
Date: Mon, 18 May 2026 19:00:37 +0700

 The previous "fix" is insufficient, if I change (the relevant part
 of) the test program from:
 
 	sleep 1 && echo timeout >&2 && kill $$ & timer=$!
 	#sleep 0.01
 	kill $timer; wait $timer 2>/dev/null || :
 to:
 	sleep 1 && echo timeout >&2 && kill $$ & timer=$!
 	#sleep 0.01
 	i=0; while [ $((++i)) -lt 5 ]; do continue; done
 	kill $timer; wait $timer 2>/dev/null || :
 
 then it still fails.   Anything bigger than '5' in the new loop
 and it "works" (at least in my DEBUG equipped sh, with the debug options
 I had enabled, a bigger value might still exhibit the problem in a non-DEBUG
 sh - and might even need to be bigger than 5 for the previous "fix" to fail).
 (executing an external command, even a very short sleep, takes much much
 longer than a simple loop like this, unless the number of iterations is large.)
 
 That is, the previous change only helps when the child has had no time
 to execute anything (or almost anything) after the fork() before the signal
 is sent.   If it has plenty of time (the normal case) all is OK, and always
 was.   But there is still a small window when the child has passed beyond
 where the previous fix helped (ie: now it knows now that it is a child process,
 and cannot execute any traps that belong to the parent) but still has not yet
 reset all the signals to the state they should be in for the child process.
 
 In my DEBUG environment, using '5' the child process was just cleaning
 up its signal states (after which receiving the signal would work properly)
 and had already (just) fixed signal 14 (SIGALRM) (it processes the signals
 in increasing numeric order) when the SIGTERM (signal 15) from the parent
 arrived.  By this time it was too late for the former fix to help (the child
 had done too much) but "missed it by THAT much" for normal processing to help).
 
 I have a new more comprehensive fix which I believe will handle this problem,
 no matter how slow, or fast, the child and parent happen to execute after
 the fork() has occurred.   It should be committed a bit later today (after
 more testing).
 
 kre
 
 



Home | Main Index | Thread Index | Old Index