Port-i386 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

recursive panic in ddb if a softint handler panics



Hi all,

On i386, ddb will panic() when processing a panic() from a softint
handler.  This makes it exceptionally hard to debug problems in
softnet code, for example.  I do not know if amd64 or any other
architecture has a similar problem.

I located the root cause of the problem (detailed below) and drafted
some patches to fix it.  I have also drafted some enhancements to
related code.  I will send the patches shortly to get feedback.  The
patches will be spread out over two follow-up emails:

  * Email #1 will contain patches that fix the problem and are
    relatively low-risk and easy to review (not to say that you won't
    find any issues with the patches).

  * Email #2 will contain patches to improve 'struct switchframe',
    making it possible to get more useful backtraces in gdb and ddb.
    These are high-risk changes because they affect low-level kernel
    operations.  I would appreciate your help in determining whether
    these changes will cause problems.  I am mostly worried about
    performance regressions (the changes affect every context switch)
    and compatibility with debugging tools (due to memory layout
    changes).

Here is a detailed description of the recursive panic() bug (line
numbers assume NetBSD-current):

When db_stack_trace_print() runs during a panic() and db_nextframe()
encounters the Xsoftintr() frame, db_nextframe() does the following at
db_machdep.c:292:

  1. checks to see if there's a Xsoftintr() symbol (there is)
  2. checks to see if the frame corresponds to an interrupt (the
     symbol name begins with "Xsoft" so it does)

If both of the above are true (they are), db_nextframe() at
db_machdep.c:303 tries to get a pointer to a struct intrframe.
According to the comment at line 300, the second argument passed to
Xsoftintr() is a pointer to a struct intrframe.  However, the comment
and the corresponding code are not correct -- Xsoftintr() doesn't take
any arguments[1].  Attempting to fetch the second argument only yields
stack garbage, not a struct intrframe.  This causes db_machdep.c:307
to dereference a bad pointer, triggering the recursive panic().

[1] Xsoftintr() is called by Xspllower() which is called by splx()
    a.k.a. spllower().  Neither Xspllower() nor Xsoftintr() set up a
    standard frame when called (they don't do 'pushl %ebp; movl %esp,
    %ebp'), so Xsoftintr()'s %ebp is the same as splx()'s %ebp.  This
    makes splx()'s arguments look like Xsoftintr()'s arguments, and
    splx() does not take any arguments.

You can reproduce the recursive panic by adding a call to panic()
inside ipintr().  The backtrace will look like the following (the line
numbers you see might differ from these line numbers -- this backtrace
was generated from a slightly modified version of the NetBSD
6.1 kernel):

    #0  vpanic (fmt=0xc0ba995b "trap", ap=0xdaa51730) at 
/usr/src/sys/kern/subr_prf.c:211
    #1  0xc0790529 in panic (fmt=0xc0ba995b "trap") at 
/usr/src/sys/kern/subr_prf.c:205
    #2  0xc07decbc in trap (frame=0xdaa517c0) at 
/usr/src/sys/arch/i386/i386/trap.c:396
    #3  0xc010cf48 in ?? () at /usr/src/sys/arch/i386/i386/vector.S:983
    #4  0xc02857f0 in db_get_value (addr=56, size=4, is_signed=false) at 
/usr/src/sys/ddb/db_access.c:72
    #5  0xc028a09a in db_nextframe (nextframe=0xdaa51b40, retaddr=0xdaa51b3c, 
arg0=0xdaa51b38, ip=0xdaa51b34, argp=0xdaa51d88, is_trap=0, pr=0xc07901b5 
<printf>) at /usr/src/sys/arch/i386/i386/db_machdep.c:308
    #6  0xc028be2b in db_stack_trace_print (addr=<optimized out>, 
have_addr=true, count=65533, modif=0xc0bb44bf "", pr=0xc07901b5 <printf>) at 
/usr/src/sys/arch/x86/x86/db_trace.c:275
    #7  0xc07903cb in vpanic (fmt=0xc0b6ba76 "testing", ap=0xdaa51d4c) at 
/usr/src/sys/kern/subr_prf.c:296
    #8  0xc0790529 in panic (fmt=0xc0b6ba76 "testing") at 
/usr/src/sys/kern/subr_prf.c:205
    #9  0xc04e3d4f in ipintr () at /usr/src/sys/netinet/ip_input.c:369
    #10 0xc054ac0d in softint_execute (s=<optimized out>, si=<optimized out>, 
l=<optimized out>) at /usr/src/sys/kern/kern_softint.c:543
    #11 softint_dispatch (pinned=0xc4085560, s=4) at 
/usr/src/sys/kern/kern_softint.c:825
    #12 0xc0100fdb in ?? () at /usr/src/sys/arch/i386/i386/spl.S:390
    #13 0xc07d2e11 in tcp_usrreq (so=0xc40b0534, req=4, m=0x0, nam=0xc317ba00, 
control=0x0, l=0xc4085560) at /usr/src/sys/netinet/tcp_usrreq.c:615
    #14 0xc04bb300 in tcp_usrreq_wrapper (a=0xc40b0534, b=4, c=0x0, 
d=0xc317ba00, e=0x0, f=0xc4085560) at /usr/src/sys/netinet/in_proto.c:164
    #15 0xc0839006 in soconnect (so=0xc40b0534, nam=0xc317ba00, l=0xc4085560) 
at /usr/src/sys/kern/uipc_socket.c:821
    #16 0xc083c4ce in do_sys_connect (l=0xc4085560, fd=4, nam=0xc317ba00) at 
/usr/src/sys/kern/uipc_syscalls.c:371
    #17 0xc083dbeb in sys_connect (l=0xc4085560, uap=0xdbc27d00, 
retval=0xdbc27d28) at /usr/src/sys/kern/uipc_syscalls.c:350
    #18 0xc07b1b4a in sy_call (rval=0xdbc27d28, uap=0xdbc27d00, l=0xc4085560, 
sy=0xc0c2f018) at /usr/src/sys/sys/syscallvar.h:61
    #19 syscall (frame=0xdbc27d48) at /usr/src/sys/arch/x86/x86/syscall.c:179
    #20 0xc010056d in ?? () at /usr/src/sys/arch/i386/i386/locore.S:1160
    Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thanks,
Richard


Home | Main Index | Thread Index | Old Index