Subject: kern/22972: signal related problem
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <manu@netbsd.org>
List: netbsd-bugs
Date: 09/27/2003 10:11:15
>Number:         22972
>Category:       kern
>Synopsis:       signal related problem
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Sep 27 10:12:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Emmanuel Dreyfus
>Release:        NetBSD-current/macppc
>Organization:
The NetBSD Project
>Environment:
don't have it at hands yet
>Description:
I observed the problem with pkgsrc/mail/jchkmail. On NetBSD-1.6.1/macppc it works fine. On NetBSD-current/macppc, it will fork two threads (which is the normal behaviour) and then one of the two threads die. In the log, you can find a "SUPERVISOR DIED?" message.

I has not been able to track down the problem to a simple program yet, so here is the story with jchkmail sources:

in src/j-main.c:709 is the following function call:
sleep (DT_ALARM);

If we add a syslog(LOG_DEBUG, "before sleep") ans syslog(LOG_DEBUG, "after sleep") around this sleep() call, we discover that after a short time, the program enters sleep() but never leave it.

When this happens, the program got a SIGALRM and entered j_father_sig_handler() in the same file, line 328. After this signal handler returns, we can see with ps -axl that the program is sleeping in "select", whereas we would exepct "nanosleep". 

In my opinion, the signal handler threw us somewhere else. I see no code that could corrupt the stack in the signal handler, it only calls signal() and syslog() when it receive a SIGALRM.
>How-To-Repeat:
cd /usr/pkgsrc/mail/jchkmail
make install
/usr/pkg/etc/rc.d/jchkmail start
>Fix:
None known yet.
>Release-Note:
>Audit-Trail:
>Unformatted: