Source-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[src/trunk]: src/sys/arch/x86/x86 Add support for xsaveopt. It is basically a...



details:   https://anonhg.NetBSD.org/src/rev/0788637e0e19
branches:  trunk
changeset: 827590:0788637e0e19
user:      maxv <maxv%NetBSD.org@localhost>
date:      Sat Nov 04 08:58:30 2017 +0000

description:
Add support for xsaveopt. It is basically an instruction that optimizes
context switch performance by not saving to memory FPU registers that are
known to be in their initial state or known not to have changed since the
last time they were saved to memory.

Our code is now compatible with the internal state tracking engine:
 - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT.
   That is to say, we always call XRSTOR first.
 - During a fork, the whole in-memory FPU state area is memcopied in the
   new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it
   will fault, we migrate the area, call XRSTOR and clear CR0_TS. During
   this XRSTOR XSTATE_BV still contains the initial values, and it forces
   a reload of XINUSE.
 - Whenever software wants to change the in-memory FPU state, it manually
   sets XSTATE_BV[i]=1, which forces XINUSE[i]=1.
 - The address of the state passed to xrstor is always the same for a
   given LWP.

fpu_save_area_clear is changed not to force a reload of CW if fx_cw is
the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt
will optimize this state.

Small benchmark:
        switch lwp to cpu2
        do float operation
        switch lwp to cpu3
        do float operation
Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds
to 20,8 seconds.

diffstat:

 sys/arch/x86/x86/fpu.c |  14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diffs (43 lines):

diff -r b38f7eee3dc8 -r 0788637e0e19 sys/arch/x86/x86/fpu.c
--- a/sys/arch/x86/x86/fpu.c    Sat Nov 04 08:55:50 2017 +0000
+++ b/sys/arch/x86/x86/fpu.c    Sat Nov 04 08:58:30 2017 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: fpu.c,v 1.23 2017/11/04 07:38:42 maxv Exp $    */
+/*     $NetBSD: fpu.c,v 1.24 2017/11/04 08:58:30 maxv Exp $    */
 
 /*
  * Copyright (c) 2008 The NetBSD Foundation, Inc.  All
@@ -96,7 +96,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: fpu.c,v 1.23 2017/11/04 07:38:42 maxv Exp $");
+__KERNEL_RCSID(0, "$NetBSD: fpu.c,v 1.24 2017/11/04 08:58:30 maxv Exp $");
 
 #include "opt_multiprocessor.h"
 
@@ -471,8 +471,11 @@
                                break;
 
                        case FPU_SAVE_XSAVE:
+                               xsave(&pcb->pcb_savefpu, x86_xsave_features);
+                               break;
+
                        case FPU_SAVE_XSAVEOPT:
-                               xsave(&pcb->pcb_savefpu, x86_xsave_features);
+                               xsaveopt(&pcb->pcb_savefpu, x86_xsave_features);
                                break;
                }
        }
@@ -559,8 +562,9 @@
                fpu_save->sv_xmm.fx_cw = x87_cw;
 
                /* Force a reload of CW */
-               if (x86_fpu_save == FPU_SAVE_XSAVE ||
-                   x86_fpu_save == FPU_SAVE_XSAVEOPT) {
+               if ((x87_cw != __INITIAL_NPXCW__) &&
+                   (x86_fpu_save == FPU_SAVE_XSAVE ||
+                   x86_fpu_save == FPU_SAVE_XSAVEOPT)) {
                        fpu_save->sv_xsave_hdr.xsh_xstate_bv |=
                            XCR0_X87;
                }



Home | Main Index | Thread Index | Old Index