Subject: kern/34612: SA returns from sleep do not set the signal flags
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <bucky@picovec.com>
List: netbsd-bugs
Date: 09/25/2006 17:45:00
>Number:         34612
>Category:       kern
>Synopsis:       SA returns from sleep do not set the signal flags
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 25 17:45:00 +0000 2006
>Originator:     Bucky Katz
>Release:        4.0-current
>Organization:
Picovex
>Environment:
NetBSD qd4 4.99.1 NetBSD 4.99.1 (QD4) #0: Wed Sep 20 17:39:24 PDT 2006  bucky@balls.picovex.com:/home/netbsd/obj/evbarm/sys/arch/evbarm/compile/QD4 evbarm

>Description:
When a pthreads application runs, and schedules some threads to sleep, the returns from the sleeps are delayed because the kernel return from the sleep does not set signotify. Sleep return is delayed until such time as some other kernel activity sets signotify.

>How-To-Repeat:
Attached is source code and example runs, along with some text explaining what the example runs are showing.  To compile the source code:

gcc -o a.out sleep_test.c -lpthread

----- source code for sleep_test.c -----

#include <sys/time.h>
#include <pthread.h>
#include <stdio.h>
#include <sys/resource.h>

static int child_threads_call_gettimeofday;

void *th(void *arg)
{
        pthread_setname_np(pthread_self(), "dummy%d", (int)arg);

        for (;;) {
                if (child_threads_call_gettimeofday) {
                        /* without the SA fix to call signotify(), an infinite
                           loop by the threads running this function will
                           result in late wakeups to the nanosleep called by
                           the main function.  if the loop makes a system
                           call instead of doing nothing, then the wakeup
                           is not late, showing that the ast is delivered on
                           time because asts are always run after system calls */
                        struct timeval tv;
                        gettimeofday(&tv,NULL);
                }
        }
}

int main(int argc, char *argv[])
{
        struct timeval tv_before, tv_after;
        uint32_t delta;
        int nrt;
        uint32_t timeout;
        int i;
        struct timespec rqt;

        if (argc!=4)
        {
                fprintf(stderr,"Usage: %s <number of threads> <timeout (ms)><child threads call gettimeofday()>\n", argv[0]);
                fflush(stderr);
                return -1;
        }

        nrt=atoi(argv[1]);
        timeout=atoi(argv[2]);
        child_threads_call_gettimeofday=atoi(argv[3]);
        rqt.tv_sec = 0;
        rqt.tv_nsec = 1000*timeout;

        // make round robin time short enough that we should get a chance
        // to run again soon after our sleep is over
        setenv("PTHREAD_RRTIME", "10", 1);

        pthread_setname_np(pthread_self(), "main", NULL);

        for (i=0;i<nrt;i++)
        {
                pthread_t tmp;
                pthread_create(&tmp,NULL,th,i);
        }

        gettimeofday(&tv_before,NULL);
        nanosleep(&rqt, NULL);
        gettimeofday(&tv_after,NULL);

        delta = (tv_after.tv_sec - tv_before.tv_sec);
        if (delta > 0) {
                // convert sec to microseconds
                delta *= (1000 * 1000);
        }
        delta += (tv_after.tv_usec - tv_before.tv_usec);
        printf("requested nanosleep for %lu microseconds, really slept %lu microseconds\n",timeout,delta);
}


----- comments on how to run sleep_test.c to show the problem -----

These are test results that show that the SIGEV_SA path for itimerfire() needs
to have a signotify() in it.

The test code sets the RRTIME low enough so that with only two threads, where
the child is infinite looping and the main is calling nanosleep(), the wakeup
latency should never be more than 1*RRTIME rounded up to tick granualarity,
which is 10ms.

Original kernel, child thread loops doing nothing.  The latency is obviously
longer than expected given test setup.  The itimer upcall is not being
delivered to the app process in time.

qd4# ./a.out 1 50000 0
requested nanosleep for 50000 microseconds, really slept 130470 microseconds
qd4# ./a.out 1 50000 0
requested nanosleep for 50000 microseconds, really slept 133530 microseconds
qd4# ./a.out 1 50000 0
requested nanosleep for 50000 microseconds, really slept 109233 microseconds


Original kernel, child thread loops calling gettimeofday().  By having the
child make a system call instead of spinning, the end of the system call has
an AST check and so delivers the itimer upcall on time.

qd4# ./a.out 1 50000 1
requested nanosleep for 50000 microseconds, really slept 54935 microseconds
qd4# ./a.out 1 50000 1
requested nanosleep for 50000 microseconds, really slept 57777 microseconds
qd4# ./a.out 1 50000 1
requested nanosleep for 50000 microseconds, really slept 56876 microseconds


This is with a modified kernel with signotify() change in itimerfire()
SIGEV_SA path.  Notice that the latency values are now always within a RRTIME
interval.


qd4# ./a.out 0 50000 0
requested nanosleep for 50000 microseconds, really slept 51600 microseconds
qd4# ./a.out 0 50000 0
requested nanosleep for 50000 microseconds, really slept 50380 microseconds
qd4# ./a.out 1 50000 0
requested nanosleep for 50000 microseconds, really slept 52114 microseconds
qd4# ./a.out 1 50000 0
requested nanosleep for 50000 microseconds, really slept 56638 microseconds
qd4# ./a.out 1 50000 0
requested nanosleep for 50000 microseconds, really slept 58925 microseconds
qd4# ./a.out 1 50000 0
requested nanosleep for 50000 microseconds, really slept 55740 microseconds
qd4# ./a.out 1 50000 1
requested nanosleep for 50000 microseconds, really slept 53941 microseconds
qd4# ./a.out 1 50000 1
requested nanosleep for 50000 microseconds, really slept 58915 microseconds
qd4# ./a.out 1 50000 1
requested nanosleep for 50000 microseconds, really slept 58624 microseconds


>Fix:
diff src/sys/kern/kern_time.c.old src/sys/kern/kern_time.c
@@ -1470,6 +1470,7 @@
        } else if (pt->pt_ev.sigev_notify == SIGEV_SA && (p->p_flag & P_SA)) {
                /* Cause the process to generate an upcall when it returns. */

+               signotify(p);
                if (p->p_userret == NULL) {
                        /*
                         * XXX stop signals can be processed inside tsleep,