NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-i386/46395: Modified i386 FP context after signal delivery and context switch



>Number:         46395
>Category:       port-i386
>Synopsis:       Modified i386 FP context after signal delivery and context 
>switch
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue May 01 17:25:00 +0000 2012
>Originator:     Bob Lee
>Release:        5.1 Stable
>Organization:
Dell - Force 10
>Environment:
NetBSD 5.1_STABLE Dell Force10 (S3240) #0: Mon May 23 06:55:18 PDT 2011 
build@ecluster-sjc-04:/work/build/buildSpaces/build15/Z9000-8-3-11/SW-NetBSD5/usr/src/sys/arch/i386/compile/S3240

>Description:
Problem:
    Floating point registers in an i386 context are corrupted after
invocation of a signal handler, context switch to another process
which uses fpu.

Assumptions:
    Semantics of a standard x86 signal handler are such that a copy of
an active fpu context is saved for inspection and potential alteration
by the signal handler, and that context (possibly modified) is
restored on program resumption on signal handler completion.  Further,
The signal handler would be provided a clean fpu context for its use.
    Semantics of a sigcontext signal version less than 2 results in a
reset fpu context for the current running lwp.  I believe the intent
was that the signal handler continues to use any exiting fpu context.

Proposed Solution:
    Preserve state of MDL_USEDFPU in
compat_16_machdep.c:sendsig_sigcontext before calling buildcontext,
restoring it flag after the function call.

Discussion:
    The primary symptom is that a sequence of floating point
operations is "sliced" by a signal.  On return from the signal handler
to base user context, the floating point context was corrupted.
Subsequent investigation narrowed this to rather than corruption, the
fpu context was reset.  This unexpected reset of the hardware fpu
context resulted in a number of user level symptoms.  Further, the
problem is limited to the 16 emulation code.
    Looking at the kernel signal delivery code, the 16 emulation is
only in the IA32 (i386) code.  In the normal signal delivery, there are
basically two methods, that being siginfo and sigcontext.  .siginfo. is
the currently preferred method, looking at its mechanism, the entire
processor context is saved on the signal stack (be it independent of the
current running lwp stack or not), and a bit set in the signal context
if the fpu context is valid in the siginfo.  After this, buildcontext
is used create the initial signal running context (registers and such).
Within buildcontext, the MD lwp flag MDL_USEDFPU is always cleared.
The return from signal restores the fpu context from siginfo, if it
was valid (I assume this allows the signal handler to modify indirectly
base lwp fpu context (for emulation, or some other MD specific reason).
This allows for a distinct fpu context for the signal handler, that has
no direct effect on the base lwp context.  I can see where resetting
MDL_USEDFPU could also be an lwp context switch optimization, but this
is the resulting siginfo semantics.
    However, the sigcontext semantics are drastically different than
that of siginfo.  The i386 code tests the number of arguments required
by the signal handler, and if the version of the sigaction descriptor
is less than 2, the 16 specific function sendsig_sigcontext is called.
In this function there is no copy of the fpu context made, and it also
calls buildcontext, which, by extension resets MDL_USEDFPU.  Thus the
semantics are that any emulated sigcontext version less than 2 resets
the fpu context of the lwp.
    Resetting the fpu context appears to be an unintended side effect
of this signal handling, that is unless the signal is expected
terminate the lwp (process).  I believe that in the compat_16_machdep.c
>How-To-Repeat:
Active process with 16 emuluation enabled in the kernel.  SIGPROF occurs and is 
delivered in the midst of a set of FP instructions, prior to the return to the 
signal handler a second process runs.  When the original FP sequence is 
resumed, the FP context has been modified.
>Fix:
Possible fix, IIUC is:

--- 
//depot/main/Dev/Cyclone/ManagedPVT/NAVASOTA-DEV-9-1-0/SW-NetBSD5/usr/src/sys/arch/i386/i386/compat_16_machdep.c
   2011-09-07 05:46:27.000000000 -0700
+++ 
/work/swos-01/glee/glee-nav4/SW-NetBSD5/usr/src/sys/arch/i386/i386/compat_16_machdep.c
     2011-09-07 05:46:27.000000000 -0700
@@ -175,6 +175,7 @@
        u_long code = KSI_TRAPCODE(ksi);
        struct sigframe_sigcontext *fp = getframe(l, sig, &onstack), frame;
        sig_t catcher = SIGACTION(p, sig).sa_handler;
+       int svufpu;

        fp--;

@@ -259,8 +260,9 @@
                sigexit(l, SIGILL);
                /* NOTREACHED */
        }
-
+       svufpu = l->l_md.md_flags & MDL_USEDFPU;
        buildcontext(l, sel, catcher, fp);
+       l->l_md.md_flags |= svufpu;

        /* Remember that we're now on the signal stack. */
        if (onstack)



Home | Main Index | Thread Index | Old Index