current-users: Re: How to read a crash file?

Subject: Re: How to read a crash file?
To: Paul Goyette <paul@whooppee.com>
From: Hubert Feyrer <hubert@feyrer.de>
List: current-users
Date: 11/17/2006 21:09:34
I've put some information on this into my NetBSD blog a few days ago, see 
<http://www.feyrer.de/NetBSD/blog.html/nb_20061115_0123.html>:

``Post mortem debugging, or: what happened before it crashed?

So your machine paniced, and as you were running X you have no clue what 
went on? Here's a nice way to find out, assuming you have a kernel crash 
dump. To ensure the latter, set kern.dump_on_panic=1 in /etc/sysctl.conf. 
Now, what to do with those crashdumps?

% ls -l /var/crash/
total 3183838
-rw-r--r--  1 root  wheel          3 Nov  2 02:09 bounds
-rw-r--r--  1 root  wheel          5 Jun 30  2004 minfree
...
-rw-------  1 root  wheel  181265401 Nov  2 02:11 netbsd.26.core.gz
-rw-------  1 root  wheel    2162696 Nov  2 02:11 netbsd.26.gz

In /var/crash, "bounds" contains an increasing counter for the crashdump 
number (it would be "27" in the above example), and "minfree" contains the 
minimum amount of free space in kilobytes that should keep free - both 
files are read by savecore(8) when /etc/rc.conf has "savecore=yes", which 
is the default.

The actual crashdump consists of two gzipped files - the actual memory 
dump "netbsd.XX.core.gz" and a copy of the running kernel "netbsd.xx.gz". 
After uncompressing the files can be used for looking at the system at the 
point of it's panic:

# gunzip netbsd.26*.gz
#

Note that the crashdump may contain sensitive data and is such only 
readable by root!

The crashdump can be read by programs that use libkvm to read through the 
crashdump's kernel memory, e.g. gdb(1), dmesg(8), ps(1), fstat(8), 
ipcs(1), netstat(8), nfsstat(8), pmap(1), w(1), pstat(8), vmstat(8) etc., 
using the -M and -N switches.

Some examples:

     * To show the system's message buffer at the time of the crash:

       % dmesg -M netbsd.26.core -N netbsd.26
       ...
       unmounting /home (/dev/wd1e)...
       unmounting /tmp (mfs:371)...warning: mfs read during shutdown
       dev = 0xff00, block = 10496, fs = /tmp
       panic: blkfree: freeing free block
       Begin traceback...
       uvm_fault(0xcbfd07f0, 0x2000, 1) -> 0xe
       fatal page fault in supervisor mode
       trap type 6 code 0 eip c0305083 cs 8 eflags 10246 cr2 2900 ilevel 0
       panic: trap
       Faulted in mid-traceback; aborting...
       dumping to dev 0,1 offset 2024327
       dump 511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496
       495 494 493 ...

       Apparently the system tried to free a block that was already fred
       here when umounting /tmp.

     * Display virtual memory parameters:

       % vmstat -M netbsd.26.core -N netbsd.26 -s
            4096 bytes per page
               8 page colors
          127888 pages managed
                 ...

     * Attach the GNU debugger gdb(1) to the system crash dumpQ, to poke
       around deeply:

       % gdb netbsd.26
       ...
       (gdb) target kcore netbsd.26.core
       panic: blkfree: freeing free block
       #0  0x0ac04000 in ?? ()
       (gdb) bt
       #0  0x0ac04000 in ?? ()
       #1  0xc03084b5 in cpu_reboot ()
       #2  0xc02a57aa in panic ()
       #3  0xc0313127 in trap ()
       #4  0xc0102dfd in calltrap ()
       #5  0xc0182544 in db_get_value ()
       #6  0xc03058f1 in db_stack_trace_print ()
       #7  0xc02a577c in panic ()
       #8  0xc0205db7 in ffs_blkfree ()
       #9  0xc020b8d5 in ffs_indirtrunc ()
       ...

     * Unfortunately there are a number of programs that I didn't get to
       work with my crashdump, but that may be due to its point
       after/during system shutdown, e.g. ps(1) didn't work.

Still that should give some start for poking around...''


  - Hubert