Subject: Re: fork, SIGCHLD, wait, waitpid, Debian Linux vs. NetBSD ...
To: Jeremy C. Reed <reed@reedmedia.net>
From: David Xu <bsddiy@21cn.com>
List: netbsd-users
Date: 04/11/2001 17:51:01
Hello Jeremy,

Wednesday, April 11, 2001, 5:17:34 PM, you wrote:

JCR> I have a daemon that forks ten other processes. Under Debian Linux when a
JCR> child is terminated, another process is forked (so there is always 11
JCR> processes total). Under NetBSD when each child is terminated another is
JCR> not started unless it is the last child left (so there are always at least
JCR> 2 processes). (I want ten children![1])

JCR> Here is an example of the code[2]:

JCR> void
JCR> signal_handler (int signal)
JCR> {
JCR>   if (signal == SIGCHLD) {
JCR>     int stat;
JCR>     while (waitpid (-1, &stat, WNOHANG) > 0);
JCR>   }
JCR>   return;
JCR> }

JCR> The main part:

JCR>   signal(SIGCHLD, signal_handler);
JCR>   children = 0; maxchildren = 10;

JCR>   while (1) {
JCR>     if (children < maxchildren) {
JCR>       if (!fork()) {
JCR>         mainloop;
JCR>         exit(OK);
JCR>       } else {
JCR>         children++;
JCR>       }
JCR>     } else {
JCR>       wait(NULL);
JCR>       children--;
JCR>     }
JCR>   }

JCR> I am guessing that under NetBSD, the signal_handler does the waitpid and
JCR> so the second wait() just hangs forever(?).

JCR> Any ideas?

JCR> Can anyone point me to any *simple* code that does a similar task? (Starts
JCR> multiple daemons and forks new ones when they exit.)

JCR> Thanks,

JCR>    Jeremy C. Reed
JCR>    http://www.reedmedia.net/

JCR> 1) My wife wants six. If I was talking about real kids, I think six is
JCR> enough; I already have three.

JCR> 2) I know some people complain about not having complete working code or
JCR> perfect K&R or ANSI code for examples. But I feel that code snippets are
JCR> fine. This code is from vm-pop3d and also gnu-pop3d. It is available via
JCR> http://www.reedmedia.net/software/virtualmail-pop3d/

there is race condition, before parent process calls wait, the child
process may already exited, then SIGCHILD sent to parent and then
signal_handler was called, and then parent's calling to wait(NULL) was
blocked there. this is a bad designed pattern.

-- 
Best regards,
David Xu