NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/50021: Linux affinity syscalls are not fully implemented



>Number:         50021
>Category:       kern
>Synopsis:       Linux affinity syscalls are not fully implemented
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jul 02 07:00:01 +0000 2015
>Originator:     Rin Okuyama
>Release:        7.99.19
>Organization:
Department of Physics, Tohoku University
>Environment:
NetBSD okuyama 7.99.19 NetBSD 7.99.19 (GENERIC) #0: Wed Jul  1 15:48:43 JST 2015  root@okuyama:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
NetBSD has not fully supported sched_setaffinity and sched_getaffinity
syscalls in its Linux emulation. Linux binaries cannot set CPU affinity
to maximize their performance in multiprocessor environments. Moreover,
Intel Math Kernel Library (MKL) determines the number of available CPUs
by using these calls. As this attempt fails, MKL launches only one
thread even on multiprocessor machines. To resolve this, we have fully
implemented linux_shced_(set|get)affinity syscalls on NetBSD-current.
>How-To-Repeat:
You need a machine supporting COMPAT_LINUX with at least two CPUs.
Some basic libraries for Linux binaries are also required (they are
provided via suse131_base package). First of all, set

    security.models.extensions.user_set_cpu_affinity=1

by sysctl. Otherwise, you need the root privilege to set CPU affinity.

We provide a test program which repeats bogus floating-point
calculations on a specific CPU (CPU1):

    http://flex.phys.tohoku.ac.jp/~okuyama/test_linux_affinity.tgz
    MD5 (test_linux_affinity.tgz) = 366e4c17f5bd7f5821d729f1a79343a6

This tarball contains binaries for amd64 and i386. For other platforms
supporting COMPAT_LINUX, you can compile it from source code, provided
you have Linux version of gcc, binutils, and so on.

On NetBSD 7.99.19, the test program fails to set CPU affinity. This is
confirmed by "top -1t" command:

    % tar zxf test_linux_affinity.tgz
    % cd test_linux_affinity
    % ./test.amd64 (or test.i386)
    setting affinity mask for CPU1
    failed to set affinity
    running anyway

    % top -1t
    ...
    CPU0 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    ...

Note that it can accidentally run on CPU1.

If you have Intel compiler suites, you can confirm that MKL launches
only one thread. The followings are results with Intel Fortran Composer
XE 2011:

    % KMP_AFFINITY=verbose ./test.mkl
    OMP: Warning #79: KMP_AFFINITY: cannot determine proper affinity mask size.
    OMP: Warning #71: KMP_AFFINITY: affinity not supported, using "disabled".
    OMP: Warning #121: Error initializing affinity - not using affinity.

    % top -1t
    ...
    CPU0 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    ...

Sorry, we cannot provide a test program with MKL, because we are not
licensed to redistribute runtime libraries.
>Fix:
Apply a patch below. For linux_sched_setaffinity(2), we just provide a
wrapper for sys__sched_setaffinity(9). On the other hand, in the case of
linux_sched_getaffinity(2), we cannot use sys__sched_getaffinity(9).
This is because the former is expected to report all CPUs available,
whereas the latter reports all CPUs unavailable, for a thread whose
affinity mask has not been set. Thus, we have implemented the Linux
syscall using codes derived from the native one.

On the patched version of NetBSD, the test program successfully sets CPU
affinity:

    % ./test.amd64
    setting affinity mask for CPU1
    succeeded to set affinity
    running on CPU1

    % top -1t
    ...
    CPU0 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU1 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    ...

We have also confirmed that MKL launches threads more than unity.
No warnings are issued. We can specify the number of threads, affinity
policy, etc., in the same manner as in native Linux environments.


--- sys/compat/linux/common/linux_sched.c.orig	2015-07-02 15:39:47.000000000 +0900
+++ sys/compat/linux/common/linux_sched.c	2015-07-01 15:46:00.000000000 +0900
@@ -65,6 +65,9 @@
 static int linux_clone_nptl(struct lwp *, const struct linux_sys_clone_args *,
     register_t *);
 
+/* Unlike Linux, dynamically calculate CPU mask size */
+#define	LINUX_CPU_MASK_SIZE (sizeof(long) * ((ncpu + LONG_BIT - 1) / LONG_BIT))
+
 #if DEBUG_LINUX
 #define DPRINTF(x) uprintf x
 #else
@@ -635,39 +638,45 @@
 		syscallarg(unsigned int) len;
 		syscallarg(unsigned long *) mask;
 	} */
-	proc_t *p;
-	unsigned long *lp, *data;
-	int error, size, nb = ncpu;
+	struct lwp *t;
+	kcpuset_t *kcset;
+	size_t size;
+	cpuid_t i;
+	int error;
 
-	/* Unlike Linux, dynamically calculate cpu mask size */
-	size = sizeof(long) * ((ncpu + LONG_BIT - 1) / LONG_BIT);
+	size = LINUX_CPU_MASK_SIZE;
 	if (SCARG(uap, len) < size)
 		return EINVAL;
 
-	/* XXX: Pointless check.  TODO: Actually implement this. */
-	mutex_enter(proc_lock);
-	p = proc_find(SCARG(uap, pid));
-	mutex_exit(proc_lock);
-	if (p == NULL) {
+	/* Lock the LWP */
+	t = lwp_find2(SCARG(uap, pid), l->l_lid);
+	if (t == NULL)
 		return ESRCH;
-	}
-
-	/* 
-	 * return the actual number of CPU, tag all of them as available 
-	 * The result is a mask, the first CPU being in the least significant
-	 * bit.
-	 */
-	data = kmem_zalloc(size, KM_SLEEP);
-	lp = data;
-	while (nb > LONG_BIT) {
-		*lp++ = ~0UL;
-		nb -= LONG_BIT;
-	}
-	if (nb)
-		*lp = (1 << ncpu) - 1;
 
-	error = copyout(data, SCARG(uap, mask), size);
-	kmem_free(data, size);
+	/* Check the permission */
+	if (kauth_authorize_process(l->l_cred,
+	    KAUTH_PROCESS_SCHEDULER_GETAFFINITY, t->l_proc, NULL, NULL, NULL)) {
+		mutex_exit(t->l_proc->p_lock);
+		return EPERM;
+	}
+
+	kcpuset_create(&kcset, true);
+	lwp_lock(t);
+	if (t->l_affinity != NULL)
+		kcpuset_copy(kcset, t->l_affinity);
+	else {
+		/*
+		 * All available CPUs should be masked when affinity has not
+		 * been set.
+		 */
+		kcpuset_zero(kcset);
+		for (i = 0; i < ncpu; i++)
+			kcpuset_set(kcset, i);
+	}
+	lwp_unlock(t);
+	mutex_exit(t->l_proc->p_lock);
+	error = kcpuset_copyout(kcset, (cpuset_t *)SCARG(uap, mask), size);
+	kcpuset_unuse(kcset, NULL);
 	*retval = size;
 	return error;
 }
@@ -680,17 +689,17 @@
 		syscallarg(unsigned int) len;
 		syscallarg(unsigned long *) mask;
 	} */
-	proc_t *p;
+	struct sys__sched_setaffinity_args ssa;
+	size_t size;
 
-	/* XXX: Pointless check.  TODO: Actually implement this. */
-	mutex_enter(proc_lock);
-	p = proc_find(SCARG(uap, pid));
-	mutex_exit(proc_lock);
-	if (p == NULL) {
-		return ESRCH;
-	}
+	size = LINUX_CPU_MASK_SIZE;
+	if (SCARG(uap, len) < size)
+		return EINVAL;
 
-	/* Let's ignore it */
-	DPRINTF(("%s\n", __func__));
-	return 0;
+	SCARG(&ssa, pid) = SCARG(uap, pid);
+	SCARG(&ssa, lid) = l->l_lid;
+	SCARG(&ssa, size) = size;
+	SCARG(&ssa, cpuset) = (cpuset_t *)SCARG(uap, mask);
+
+	return sys__sched_setaffinity(l, &ssa, retval);
 }



Home | Main Index | Thread Index | Old Index