Subject: kern/17171: Dead Child does not raise SIGCHLD until after parent reads all output on a pty.
To: None <gnats-bugs@gnats.netbsd.org>
From: None <noah@noah.org>
List: netbsd-bugs
Date: 06/04/2002 17:13:59
>Number:         17171
>Category:       kern
>Synopsis:       Dead Child does not raise SIGCHLD until after parent reads all output on a pty.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 04 17:15:00 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     Noah Spurrier
>Release:        1.5.3_ALPHA
>Organization:
None
>Environment:
NetBSD thor 1.5.3_ALPHA NetBSD 1.5.3_ALPHA (ART)
>Description:
When a child process dies it normally sends a SIGCHLD to the parent.
The child remains in the process list as a zombie until the
parent calls waitpid() or wait(). However this does NOT seem to be the
case for child created with forkpty(). A child may be dead (zombie) and
the parent will never receive a SIGCHLD and will block on wait
unless the file descriptor of the pty is emptied first. I know that
the child is in the "Waiting to Exit" state (zombie)  because I can
see this in the ps listing and because it has been sent a SIGKILL
from which is cannot, presumably, ignore. The critical things seems
to be for the pty to have unread data. If the child never print output
to the pty or if the parent consumes it all then the SIGCHLD will be raised.

As far as I can tell, the pty device and fork should have nothing to
do with each other. I realize that pty devices and especially
forkpty are non-standard (at least not POSIX), but forkpty is built
on top of fork. This synchronous behavior strikes me as a surprising
side effect. Shouldn't the SIGCHLD signal should be asynchronous and
be unrelated to the state of the pty device? If I had created my own
forkpty using openpty and fork, then I would cannot imagine why the
pty would prevent my child's SIGCHLD from being sent.
I may be wrong in my assumption of expected behavior. Maybe there is
a layer in between that is proxying signals for some reason.
I'm sorry that I'm not a real hacker to go track this down for you.

I confirmed this behavior on NetBSD 1.5.3


>How-To-Repeat:
I have attached a test program, test.c, that should demonstrate
the problem. This program will also compile on Linux and OS X, so you
have other platforms to compare it against. Email me at noah@noah.org 
if you would prefer I send it in a separate email. I can also send sample output from a NetBSD 1.5.3; a Linux machine; and an OSX machine.
 
I have tested this program on OS X and Linux. Both of those systems 
do not show this problem. The SIGCHLD signal always arrives not long 
after the child gets a SIGKILL, nor is it ever synchronous with some 
state in the pty.

This test.c will allow you to test three different scenarios. 

If you run it with 'test 0' then the Child will print some output
before it is killed. The Parent will NOT read output after child is
killed. You will see that the parent never receives a SIGCHLD even
though the child is clearly good and dead.

If you run it with 'test 1' then the Child will NOT print any output
nor will the parent attempt to read any. In this case the Parent
will receive the SIGCHLD signal and you can see that it occurs at
the time the signal is sent. In other words, the signal does not
appear to be delayed and appears asynchronously as expected.

If you run it with 'test 2' then the Child will print some output
before it is killed. The Parent will read output AFTER child is
killed. In this case the signal does not arrive until AFTER the
parent reads the output. The parent is reading data from a dead
child (which is not necessarily bad), but it never gets the SIGCHLD
signal until after the data from the dead child is consumed. 
This shows surprising synchronous behavior.

I hope that this is clear enough. I tried to be thorough and avoid
any obvious newbie mistakes before I submitted this as a bug. I also
took some small effort to compare the NetBSD behavior with other
UNIX platforms.

/* 
  I built this with "gcc -lutil test.c -otest"
  So far I have tested this on OpenBSD 3.0 and OpenBSD 2.9
  Linux 2.4.9 and OS X (close to NetBSD I believe).
  As a test, I ignore most exceptional errors such as failed fork or waitpid.
*/

#include <sys/types.h>  /* include this before any other sys headers */
#include <sys/wait.h>   /* header for waitpid() and various macros */
#include <signal.h>     /* header for signal functions */
#include <stdio.h>      /* header for fprintf() */
#include <unistd.h>     /* header for fork() */
#ifdef LINUX
#include <pty.h>
#else
#include <util.h>        /* header for forkpty, compile with -lutil */
#endif

void sig_chld(int);  /* prototype for our SIGCHLD handler */

int main(int argc, char * argv[]) 
{
    struct sigaction act;
    int pid;
    int fd;
    char slave_name [20];
    int CHILD_OUTPUT_FLAG;
    int PARENT_READ_FLAG;
    char buffer [1000];
    int count;

    /*
        Command line arguments:
                0 - or nothing for default. Child will print some output before it is killed.
                        Parent will end without ever trying to read this output.
                1 - To run test where child will not print any output.
                2 - To run test where child will print output and 
                        parent will try to read output after child is killed.
    */
    if (argc > 1 && *(argv[1]) == '1')
    {
        printf ("PARENT: Child will not print any output.
");
        printf ("PARENT: Parent will NOT read output after child is killed.
");
        CHILD_OUTPUT_FLAG = 0;
        PARENT_READ_FLAG = 0;
    }
    else if (argc > 1 && *(argv[1]) == '2')
    {
        printf ("PARENT: Child will print some output before it is killed.
");
        printf ("PARENT: Parent will read output after child is killed.
");
        CHILD_OUTPUT_FLAG = 1;
        PARENT_READ_FLAG = 1;
    }
    else
    {
        printf ("PARENT: Child will print some output before it is killed.
");
        printf ("PARENT: Parent will NOT read output after child is killed.
");
        CHILD_OUTPUT_FLAG = 1;
        PARENT_READ_FLAG = 0;
    } 

    /* Assign sig_chld as our SIGCHLD handler.
       We don't want to block any other signals in this example 
       We're only interested in children that have terminated, not ones
       which have been stopped (eg user pressing control-Z at terminal).
       Finally, make these values effective. If we were writing a real 
       application, we would save the old value instead of passing NULL.
     */
    act.sa_handler = sig_chld;
    sigemptyset(&act.sa_mask);
    act.sa_flags = SA_NOCLDSTOP;
    sigaction(SIGCHLD, &act, NULL);


    /* Do the Fork thing. 
    */
    pid = forkpty (&fd, slave_name, NULL, NULL);
    /* pid = fork(); */

    switch (pid)
    {
            case 0: /* Child process. */     
                if (CHILD_OUTPUT_FLAG)
                    printf ("CHILD: This output may cause trouble.
");
                sleep(1000);
            break;

            default: /* Parent process. */
                printf ("PARENT: After fork, sleeping...
");
                sleep(5); /* Crappy way to avoid a race with child. */
                printf ("PARENT: Child pid: %d
", pid); 
                printf ("PARENT: sending SIGKILL to child...
");
                kill (pid, SIGKILL);
                printf ("PARENT: After kill, sleeping...
");
                sleep(5);
            break;
    }

    if (PARENT_READ_FLAG)
    {
        printf ("PARENT: Consuming any output from child pty fd.
");
        count = read (fd, buffer, 999);
        printf ("PARENT: Read %d characters.
", count);
    }
    else
    {
        printf ("PARENT: Not attempting to read from child.
");
    }

    printf ("PARENT: leaving.

");
    return 0;
}
 
void sig_chld(int signo) 
{
    int status, wpid, child_val;

    printf ("SIGCHLD: In sig_chld signal handler.
");

    /* Wait for any child without blocking */
    wpid = waitpid (-1, & status, WNOHANG);
    printf ("SIGCHLD:	Waitpid found status for pid: %d
", wpid);
    printf("SIGCHLD:	Waitpid status: %d
", status);

    if (WIFEXITED(status)) /* did child exit normally? */
    {
        child_val = WEXITSTATUS(status); 
        printf("SIGCHLD:	child exited normally with status %d
", child_val);
    }
    printf ("SIGCHLD: End of sig_chld.
");
}

>Fix:
Unknown
>Release-Note:
>Audit-Trail:
>Unformatted: