NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: standards/59586 (sigwaitinfo() returns ECANCELED instead of EINTR - POSIX compliance violation)



Synopsis: sigwaitinfo() returns ECANCELED instead of EINTR - POSIX compliance violation

State-Changed-From-To: open->analyzed
State-Changed-By: riastradh%NetBSD.org@localhost
State-Changed-When: Thu, 18 Dec 2025 14:43:43 +0000
State-Changed-Why:
Sorry for the delay.  Looks like this was a mistake that traces back to
vestiges of an earlier architecture for multithreading that NetBSD
abandoned in the mid-2000s, scheduler activations (PTHREAD_SA); it was
never intended to be exposed to userland even then, when the latest
POSIX spec, POSIX.1-2001, prescribed EINTR:

https://pubs.opengroup.org/onlinepubs/009604499/functions/sigtimedwait.html

The code was added in kern_sig.c rev. 1.135 in 2003 with logic to
return ECANCELED if the thread was woken, not by a signal for _this_
thread but by a signal for _some other thread_:

/* kern_sig.c 1.135 */
+	/*
+	 * Wait for signal to arrive. We can either be woken up or
+	 * time out.
+	 */
+	error = tsleep(&p->p_sigctx.ps_sigwait, PPAUSE|PCATCH, "sigwait", timo);
+
+	/*
+	 * Check if a signal from our wait set has arrived, or if it
+	 * was mere wakeup.
+	 */
+	if (!error) {
+		if ((signum = p->p_sigctx.ps_sigwaited) <= 0) {
+			/* wakeup via _lwp_wakeup() */
+			error = ECANCELED;
+		}
+	}

/* pthread_sig.c 1.9 */
+		/*
+		 * We are either the only one, or wait set was setup already.
+		 * Just do the syscall now.
+		 */
+		error = __sigtimedwait(&wset, info, (timeout) ? &timo : NULL);
+
+		pthread_spinlock(self, &pt_sigwaiting_lock);
+		if ((error && errno != ECANCELED)
+		    || (!error && __sigismember14(set, info->si_signo)) ) {
...
+		}
+
+		if (!error) {
+			/*
+			 * Got a signal, but not from _our_ wait set.
+			 * Scan the queue of sigwaiters and wakeup
+			 * the first thread waiting for this signal.
+			 */
...


https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_sig.c#rev1.135
https://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libpthread/Attic/pthread_sig.c#rev1.9

EINTR doesn't appear in those diffs, but it can come flying out of
tsleep(9) and the sys___sigtimedwait function implementing the guts of
the sigtimedwait(2) syscall would pass it through in the error branch.
ECANCELED was used as a special error code meant only for use inside
libpthread, and was not supposed to leak outside.

However, this was buggy because the erstwhile tsleep(9) logic would
return EINTR in the `cancelled' case, so the code was tweaked a few
months later to check for EINTR and _none_ of the listed signals:

 	/*
-	 * Check if a signal from our wait set has arrived, or if it
-	 * was mere wakeup.
+	 * Need to find out if we woke as a result of lwp_wakeup()
+	 * or a signal outside our wait set.
 	 */
-	if (!error) {
-		if ((signum = p->p_sigctx.ps_sigwaited) <= 0) {
-			/* wakeup via _lwp_wakeup() */
-			error = ECANCELED;
-		}
+	if (error == EINTR && p->p_sigctx.ps_sigwaited
+	    && !firstsig(&p->p_sigctx.ps_siglist)) {
+		/* wakeup via _lwp_wakeup() */
+		error = ECANCELED;
+	} else if (!error && p->p_sigctx.ps_sigwaited) {
+		/* spurious wakeup - arrange for syscall restart */
+		error = ERESTART;
+		goto fail;
 	}

https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_sig.c#rev1.171

That code might have been fine, since it only maps EINTR to ECANCELED
if none of the signals we're waiting for were delivered (not 100% sure
about that, perhaps ps_siglist had other entries too, but it's not
worthwhile to dig any deeper into 20-year-old abandoned logic!).

But later, in the renovation to remove SA on the newlock2 branch in
2007, this logic garbled in translation into `map EINTR to ECANCELED'
no matter what signals had been delivered in the interim:

/* kern_sig.c 1.228.2.2 */
-       /*
-        * Need to find out if we woke as a result of lwp_wakeup()
-        * or a signal outside our wait set.
-        */
-       if (error == EINTR && p->p_sigctx.ps_sigwaited
-           && !firstsig(&p->p_sigctx.ps_siglist)) {
-               /* wakeup via _lwp_wakeup() */
-               error = ECANCELED;
-       } else if (!error && p->p_sigctx.ps_sigwaited) {
-               /* spurious wakeup - arrange for syscall restart */
-               error = ERESTART;
-               goto fail;
-       }

/* sys_sig.c 1.1.2.1 */
+       /*
+        * Need to find out if we woke as a result of lwp_wakeup() or a
+        * signal outside our wait set.
+        */
+       if (l->l_sigwaited != NULL) {
+               if (error == EINTR) {
+                       /* wakeup via _lwp_wakeup() */
+                       error = ECANCELED;
+               } else if (!error) {
+                       /* spurious wakeup - arrange for syscall restart */
+                       error = ERESTART;
+               }
+       }

https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/sys_sig.c#rev1.1.2.1
https://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_sig.c#rev1.228.2.2

And that is the bug that you hit here.

Since the SA logic to request userland to yield is no longer relevant,
the internal ECANCELED magic is no longer relevant, and we should just
delete the `if (error == EINTR) { error = ECANCELED; }' branch.  And,
of course, we should add some automatic tests for this case, yesterday.





Home | Main Index | Thread Index | Old Index