NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/41411: tnftp sometimes exits due to an un-handled SIGALRM

>Number:         41411
>Category:       bin
>Synopsis:       tnftp sometimes exits due to an un-handled SIGALRM
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue May 12 01:05:00 +0000 2009
>Originator:     Brian Haley
>Release:        None
Hewlett Packard
Linux (tnftp port to Debian)
We've been using tnftp as part of our QA process to help generate load on two 
Linux systems.  Sometimes under very heavy load it would fail to transfer a 
file, and would return a status value of 142 (captured in our error log).

While 142 might seem too high, according to the manual page for bash(1),

    The return value of a simple command is its exit status, or 128+n
    if the command is terminated by signal n.

So if tnftp is returning 142, that's 128 + 14.  Signal 14 is SIGALRM.  
According to signal(7), the default action for SIGALRM is process termination 
without core dump.

After some investigation and debugging of the code, we found two places where 
there is a very small window where a SIGALRM could be delivered un-expectedly.  
Both were in the progress meter code (patch included) - before the alarm timer 
was disabled, the signal action was set back to SIG_DFL.  This code differed 
from code in getreply() in ftp.c that first disabled the timer, then reset the 
signal handler.

With this patch applied the tests have successfully run for three days without 
an error, where before it might last for 24 hours.  Please let me know if there 
are any questions.

Also, if this should be reported on let me know, 
but that didn't seem like the place to report bugs.

--> diff -u progressbar.c.orig progressbar.c
--- progressbar.c.orig  2005-06-10 00:05:01.000000000 -0400
+++ progressbar.c       2009-05-11 20:50:06.000000000 -0400
@@ -158,8 +158,8 @@
                            "transfer aborted because stalled for %lu sec.\r\n",
                            getprogname(), (unsigned long)wait.tv_sec);
                        (void)write(fileno(ttyout), buf, len);
-                       (void)xsignal(SIGALRM, SIG_DFL);
+                       (void)xsignal(SIGALRM, SIG_DFL);
                        siglongjmp(toplevel, 1);
@@ -176,8 +176,8 @@
                        (void)xsignal_restart(SIGALRM, updateprogressmeter, 1);
                        alarmtimer(1);          /* set alarm timer for 1 Hz */
                } else if (flag == 1) {
-                       (void)xsignal(SIGALRM, SIG_DFL);
+                       (void)xsignal(SIGALRM, SIG_DFL);
 #ifndef NO_PROGRESS

Home | Main Index | Thread Index | Old Index