Subject: kern/37437: signal problems in linux threads
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <arto.huusko@pp2.inet.fi>
List: netbsd-bugs
Date: 11/26/2007 19:00:01
>Number:         37437
>Category:       kern
>Synopsis:       linux sigaction does not affect all threads
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Nov 26 19:00:00 +0000 2007
>Originator:     arto.huusko@pp2.inet.fi
>Release:        NetBSD 4.99.34
>Organization:
>Environment:
Architecture: x86_64
Machine: amd64
>Description:
While the process p_sigacts structure is shared between threads (cloned
processes) in emulated Linux process, the p_sigctx structure is not shared.
The sigsets ps_sigcatch and ps_sigignore are still used to determine how
a signal is delivered, and sigaction system call changes those sigsets
in addition p_sigacts.
>How-To-Repeat:
Java 6 crashes very easily even with simple programs. Java 1.5 seems to be
more stable, but I'm sure it too will crash at some point.

The following simple program illustrates the problem. When run on Linux
and NetBSD (as native NetBSD binary), the program prints "SEGV caught",
and exits.

However, when the Linux binary is run on NetBSD, the program core dumps.
Ktrace output shows that the signal is handled with default action instead
of the installed action.

-----BEGIN-----
#include <err.h>
#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>

static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

static void *
otherfunc(void *arg)
{
    int *bar = 0;
	int foo;

    pthread_mutex_lock(&mutex); /* Wait for parent to sigaction */
    pthread_mutex_unlock(&mutex);

    foo = *bar;
    printf("%d\n", foo);

    return NULL;
}

static void
sigfunc(int signo, siginfo_t *siginfo, void *context)
{
    printf("SEGV caught, addr %p\n", siginfo->si_addr);
	exit(EXIT_SUCCESS);
}

int
main(int argc, char **argv)
{
    pthread_t other;
    struct sigaction act;

    pthread_mutex_lock(&mutex); /* Block child until sigact done */

    errno = pthread_create(&other, NULL, otherfunc, NULL);
    if (errno)
        err(EXIT_FAILURE, "pthread_create failed");

    memset(&act, 0, sizeof(act));
    act.sa_sigaction = sigfunc;
    act.sa_flags = SA_SIGINFO;
    if (sigaction(SIGSEGV, &act, NULL))
        err(EXIT_FAILURE, "sigaction failed");

    pthread_mutex_unlock(&mutex);

    errno = pthread_join(other, NULL);
    if (errno)
        err(EXIT_FAILURE, "pthread_join failed");

    return EXIT_SUCCESS;
}
-----END-----
>Fix:
The following hack of a patch fixes the problem (only for linux32, similar
patch for linux emulation should be simple enough). I doubt this is suitable
to commit to NetBSD, but it does allow me to use Java 6.

Index: sys/compat/linux32/common/linux32_signal.c
===================================================================
RCS file: /cvsroot/src/sys/compat/linux32/common/linux32_signal.c,v
retrieving revision 1.4
diff -u -r1.4 linux32_signal.c
--- sys/compat/linux32/common/linux32_signal.c	18 Mar 2007 21:38:32 -0000	1.4
+++ sys/compat/linux32/common/linux32_signal.c	26 Nov 2007 18:42:18 -0000
@@ -39,6 +39,7 @@
 
 #include <compat/netbsd32/netbsd32.h>
 
+#include <compat/linux/common/linux_emuldata.h>
 #include <compat/linux32/common/linux32_types.h>
 #include <compat/linux32/common/linux32_signal.h>
 #include <compat/linux32/linux32_syscallargs.h>
@@ -239,12 +240,40 @@
 		sigemptyset(&os.sa_mask);
 		os.sa_flags = 0;
 	} else {
-		if ((error = sigaction1(l, 
-		    linux32_to_native_signo[sig],	
-		    SCARG_P32(uap, nsa) ? &ns : NULL,
-		    SCARG_P32(uap, osa) ? &os : NULL,
-		    tramp, vers)) != 0)
-			return error;
+		/*
+		 * A very crude hack to set sigctx for all threads.
+		 *
+		 * In addition to being a hack, be also lazy, and just call sigaction
+		 * for all the threads.
+		 *
+		 * XXX: how is linux_emuldata_shared supposed to be protected?
+	 	 * As far as I see, linux32_e_proc_init does not protect the thread
+		 * list at all, which breaks down if two or more threads fork off a
+		 * new thread at the same time. This breaks also if threads come or
+	 	 * go while running this.
+		 */
+		if (SCARG_P32(uap, nsa) != NULL) {
+			struct proc *p = l->l_proc;
+			struct linux_emuldata *e = p->p_emuldata;
+			struct linux_emuldata *eo;
+
+			LIST_FOREACH(eo, &e->s->threads, threads) {
+				struct lwp *l1 = LIST_FIRST(&eo->proc->p_lwps);
+				if ((error = sigaction1(l1,
+				    linux32_to_native_signo[sig],	
+				    SCARG_P32(uap, nsa) ? &ns : NULL,
+				    SCARG_P32(uap, osa) ? &os : NULL,
+				    tramp, vers)) != 0)
+					return error;
+			}
+		} else {
+			if ((error = sigaction1(l, 
+			    linux32_to_native_signo[sig],	
+			    SCARG_P32(uap, nsa) ? &ns : NULL,
+			    SCARG_P32(uap, osa) ? &os : NULL,
+			    tramp, vers)) != 0)
+				return error;
+		}
 	}
 
 	if (SCARG_P32(uap, osa) != NULL) {