NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/38588: _sched_setaffinity() does not always work properly



>Number:         38588
>Category:       kern
>Synopsis:       _sched_setaffinity() does not always work properly
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 05 17:15:01 +0000 2008
>Originator:     Andrew Doran
>Release:        4.99.62
>Organization:
The NetBSD Project
>Environment:
Dual core system.
>Description:
I have a program that does the following. 'id' is zero.

        cpuset = calloc(sizeof(cpuset_t), 0);
        if (cpuset == NULL)
                err(EXIT_FAILURE, "malloc");
        CPU_SET(id, cpuset);
        if (_sched_setaffinity(0, 0, sizeof(cpuset_t), cpuset) < 0)
                ...
        sleep(30);

While it's sleeping, I check the affinity setting with schedctl:

        # schedctl -p 8602
          LID:              1
          Priority:         43
          Class:            SCHED_OTHER
          Affinity (CPUs):  0

That looks OK. The next thing the program does is this:

         _lwp_ctl(LWPCTL_FEATURE_CURCPU, &lc);
        printf("lwpctl::lc_curcpu=%d\n", lc->lc_curcpu);
 
That outputs the following, indicating that it's still running on CPU1,
when the affinity mask should prevent it from doing that:

        lwpctl::lc_curcpu=1

The program runs a bit further, does some I/O, and lc_curcpu finally
reaches 0. It doesn't appear to be a problem with lwpctl, because 
reading topology information from the CPU confirms lwpctl's results.

In theory, replacing the sleep() with a sched_yield() should work
around the problem, because it will make the LWP enter mi_switch().
It seems that doesn't do the trick, either.
>How-To-Repeat:
Code snippet above.
>Fix:
lwp_migrate() should set an immediate preemption on the LWP if it's in
the LSONPROC state, so that it migrates as soon as possible. See the
block in sched_enqueue().

sched_takecpu() should ignore weak affinity if l_target_cpu != NULL. It
looks like an LWP can awaken now, and its binding will be ignored if
it has not yet migrated.

Something else, maybe? I don't know.



Home | Main Index | Thread Index | Old Index