NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/38670: ^Z does not work anymore for this program.



On Sun, May 18, 2008 at 01:46:57PM -0400, Christos Zoulas wrote:
 > |  That seems to have come in with 4.4BSD, not sure what it's all about.

This comment was lost during some of the reorganization:

                   /*
                    * If a child holding parent blocked, stopping could
                    * cause deadlock: discard the signal.
                    */

I'm not sure what this hypothetical deadlock would be, though.

 > The regression has been introduced recently though. This works fine with
 > NetBSD rebar.astron.com 4.99.1 NetBSD 4.99.1 (ASTRON) #2: Fri Sep  8 
 > 15:10:53 EDT 2006  
 > christos%rebar.astron.com@localhost:/usr/src/sys/arch/i386/compile/ASTRON i38

What happened is that the interruptible tsleep() in the parent process
that waits for the child to exec got changed to an uninterruptible
cv_wait(). Thus, in your example, the parent processes of your example
would stop, which is sufficient for the shell to report a stopped job,
and the child calling sleep() wouldn't.

Changing that call to cv_wait_sig() ought to restore the previous
behavior; however, it's not clear that this is a particularly good
idea, because if a signal arrives and results in ERESTARTSYS there'll
be another child process created, and if it results in EINTR then the
parent and the child will both be running on the same stack in the
same address space, and demons will fly out of someone's nose.

In 4.99.1 it might have worked to just stop and continue the parent,
provided SIG_DFL for both SIGTSTP and SIGCONT, because stopped
processes got stopped in their tracks wherever they happened to be in
the kernel (and while holding whatever locks they happened to be
working with, etc.) but that apparently got fixed last March; now it
requires either EINTR or ERESTARTSYS.

However, since having arbitrarily long uninterruptible waits isn't
such a great idea, maybe we should try to come up with a way to make
this work. Or maybe an adequate substitute is to change the WCHAN to
"vfork" so one can at least tell what's happening and find/kill off
the child process if things are stuck. But this probably would
probably require breaking the CV abstraction.

Also, I wonder what happens if someone does ptrace(PT_ATTACH, ...) on
a vfork child. This should probably be forbidden; it currently isn't
and I suspect it will make a mess.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index