Subject: waitpid(2) oddity
To: None <current-users@netbsd.org>
From: David Young <dyoung@pobox.com>
List: current-users
Date: 11/27/2007 13:07:29
A daemon I wrote periodically forks and runs a shell script.  In a
SIGCHLD handler, the daemon calls waitpid(2) to collect the dead process.
waitpid(2) has started reporting that the child exited on signal 82
(?!), and it dumped core:

Nov  3 07:25:15 cuw hslsd: sigchild_handler: child 5106 exited on signal 82, dumped core

There is no core file, though.  The child never quit:

# ps aux -p 5106
USER  PID %CPU %MEM VSZ RSS TTY STAT STARTED    TIME COMMAND
root 5106  0.0  0.0   0   0 ?   ZW         - 0:00.00 (sh)

I am running a kernel and userland from Nov 23.  Here is the code that
calls waitpid(2) on the dead child, for reference.  It has always worked
before, but I guess that there could be a bug.

static void
sigchild_handler(int fd, short ev, void *arg)
{
        int status;
        struct hsls_shell_watchdog *hsw;

        hsw = (struct hsls_shell_watchdog *)arg;

        if (hsw->hsw_pid == 0)
                return;
        if (waitpid(hsw->hsw_pid, &status, WNOHANG) == -1) {
                loglib_warn("%s: waitpid", __func__);
        } else if (WIFSTOPPED(status)) {
                loglib_warnx("%s: child %u stopped on signal %d", __func__,
                    hsw->hsw_pid, WSTOPSIG(status));
        } else if (WIFEXITED(status) && WEXITSTATUS(status) != 0) {
                loglib_warnx("%s: child %u exit status %d", __func__,
                    hsw->hsw_pid, WEXITSTATUS(status));
        } else if (WIFSIGNALED(status)) {
                loglib_warnx("%s: child %u exited on signal %d%s", __func__,
                    hsw->hsw_pid, WTERMSIG(status),
                    WCOREDUMP(status) ? ", dumped core" : "");
        }
        if (WIFEXITED(status) || WIFSIGNALED(status))
                hsw->hsw_pid = 0;
}


-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933 ext 24