NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-i386/52560: gdb kernel backtrace fails to show function where trap occurred



>Number:         52560
>Category:       port-i386
>Synopsis:       gdb kernel backtrace fails to show function where trap occurred
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Sep 19 17:50:00 +0000 2017
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date 2017.09.19.02.44.14
>Organization:

>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

When an i386 kernel has crashed due to an errant pointer and dumped
core, and you examine the core file using gdb, the backtrace correctly
displays every function in the call stack *except* the one where the
error actually occurred.

For example, if I modify sys_reboot() so that it deliberatly
dereferences an invalid pointer, and then invoke the reboot syscall,
at the time of the crash the console will correctly display a
backtrace that includes sys_reboot():

  panic: trap
  cpu0: Begin traceback...
  vpanic(c109de38,c8f7cd78,c8f7cd78,c8f7ce04,c011f7d3,c109de38,c8f7ce10,c8f7ce10,2,e) at netbsd:vpanic+0x1bb
  vpanic(c109de38,c8f7ce10,c8f7ce10,2,e,c8f7ce10,c8f7cdb4,c1f5a1e4,c8f7a000,c0bc739b) at netbsd:vpanic
  trap() at netbsd:trap+0x27a
  --- trap (number 6) ---
  sys_reboot(c2086540,c8f7cf74,c8f7cf6c,ffff0ff0,c8f7cf3c,c01675de,c16b0e20,c2086540,c8f7cf74,c8f7cf6c) at netbsd:sys_reboot+0xa9
  sy_call(c16b0e20,c2086540,c8f7cf74,c8f7cf6c,c016773d,86540,c2086540,c8f7cf9c,c0167885,c16b0e20) at c016750e
  sy_invoke(c16b0e20,c2086540,c8f7cf74,c8f7cf6c,d0,0,c2086540,c1f5a1e4,d0,c16b0e20) at netbsd:sy_invoke+0xbb
  syscall() at netbsd:syscall+0xd7
  --- syscall (number 208) ---
  bab5a397:
  cpu0: End traceback...

but when later examining the crash dump with gdb, it displays an
incorrect address and a "??" in place of sys_reboot():

  #4  0xc011f7d3 in trap (frame=0xc8f7ce10)
      at /usr/src/sys/arch/i386/i386/trap.c:324
  #5  0xc011400f in alltraps ()
  #6  0xc8f7ce10 in ?? ()
  #7  0xc016750e in sy_call (sy=0xc16b0e20 <sysent+4160>, l=0xc2086540, 

I am marking this bug as critical because it is making a large class
of kernel bugs much harder to fix.

I first noticed this problem while filing PR 52553.  Since the
prodcedure for reproducing that issue requires specific hardware
(athn), below is an alternative procedure that does not.

>How-To-Repeat:

Apply the following patch:

Index: src/sys/kern/kern_xxx.c
===================================================================
RCS file: /bracket/repo/src/sys/kern/kern_xxx.c,v
retrieving revision 1.73
diff -u -r1.73 kern_xxx.c
--- src/sys/kern/kern_xxx.c	29 Oct 2015 00:27:08 -0000	1.73
+++ src/sys/kern/kern_xxx.c	19 Sep 2017 13:57:19 -0000
@@ -67,6 +67,10 @@
 	    0, NULL, NULL, NULL)) != 0)
 		return (error);
 
+	/* Abuse AB_DEBUG for testing trap handling */
+	if ((SCARG(uap, opt) & AB_DEBUG))
+		*((char *)1) = 0;
+
 	/*
 	 * Only use the boot string if RB_STRING is set.
 	 */

Build an i386 release with build.sh -V MKDEBUG=yes -V COPTS=-g.
(Or just a kernel; I built a full release because that's what
I have fully automated).

Install it, boot it, log in as root, and issue the command "reboot -x".
This will cause a trap in sys_reboot(), a core dump, and a reboot.

Log in as root again and issue the commands

  cd /var/crash
  gunzip *.gz
  gdb /netbsd
  (gdb) target kvm netbsd.0.core
  (gdb) bt

Notice how sys_reboot() does not appear in the backtrace.

>Fix:



Home | Main Index | Thread Index | Old Index