Subject: Re: How to read a crash file?
To: Paul Goyette <paul@whooppee.com>
From: Hubert Feyrer <hubert@feyrer.de>
List: current-users
Date: 11/17/2006 21:09:34
I've put some information on this into my NetBSD blog a few days ago, see
<http://www.feyrer.de/NetBSD/blog.html/nb_20061115_0123.html>:
``Post mortem debugging, or: what happened before it crashed?
So your machine paniced, and as you were running X you have no clue what
went on? Here's a nice way to find out, assuming you have a kernel crash
dump. To ensure the latter, set kern.dump_on_panic=1 in /etc/sysctl.conf.
Now, what to do with those crashdumps?
% ls -l /var/crash/
total 3183838
-rw-r--r-- 1 root wheel 3 Nov 2 02:09 bounds
-rw-r--r-- 1 root wheel 5 Jun 30 2004 minfree
...
-rw------- 1 root wheel 181265401 Nov 2 02:11 netbsd.26.core.gz
-rw------- 1 root wheel 2162696 Nov 2 02:11 netbsd.26.gz
In /var/crash, "bounds" contains an increasing counter for the crashdump
number (it would be "27" in the above example), and "minfree" contains the
minimum amount of free space in kilobytes that should keep free - both
files are read by savecore(8) when /etc/rc.conf has "savecore=yes", which
is the default.
The actual crashdump consists of two gzipped files - the actual memory
dump "netbsd.XX.core.gz" and a copy of the running kernel "netbsd.xx.gz".
After uncompressing the files can be used for looking at the system at the
point of it's panic:
# gunzip netbsd.26*.gz
#
Note that the crashdump may contain sensitive data and is such only
readable by root!
The crashdump can be read by programs that use libkvm to read through the
crashdump's kernel memory, e.g. gdb(1), dmesg(8), ps(1), fstat(8),
ipcs(1), netstat(8), nfsstat(8), pmap(1), w(1), pstat(8), vmstat(8) etc.,
using the -M and -N switches.
Some examples:
* To show the system's message buffer at the time of the crash:
% dmesg -M netbsd.26.core -N netbsd.26
...
unmounting /home (/dev/wd1e)...
unmounting /tmp (mfs:371)...warning: mfs read during shutdown
dev = 0xff00, block = 10496, fs = /tmp
panic: blkfree: freeing free block
Begin traceback...
uvm_fault(0xcbfd07f0, 0x2000, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c0305083 cs 8 eflags 10246 cr2 2900 ilevel 0
panic: trap
Faulted in mid-traceback; aborting...
dumping to dev 0,1 offset 2024327
dump 511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496
495 494 493 ...
Apparently the system tried to free a block that was already fred
here when umounting /tmp.
* Display virtual memory parameters:
% vmstat -M netbsd.26.core -N netbsd.26 -s
4096 bytes per page
8 page colors
127888 pages managed
...
* Attach the GNU debugger gdb(1) to the system crash dumpQ, to poke
around deeply:
% gdb netbsd.26
...
(gdb) target kcore netbsd.26.core
panic: blkfree: freeing free block
#0 0x0ac04000 in ?? ()
(gdb) bt
#0 0x0ac04000 in ?? ()
#1 0xc03084b5 in cpu_reboot ()
#2 0xc02a57aa in panic ()
#3 0xc0313127 in trap ()
#4 0xc0102dfd in calltrap ()
#5 0xc0182544 in db_get_value ()
#6 0xc03058f1 in db_stack_trace_print ()
#7 0xc02a577c in panic ()
#8 0xc0205db7 in ffs_blkfree ()
#9 0xc020b8d5 in ffs_indirtrunc ()
...
* Unfortunately there are a number of programs that I didn't get to
work with my crashdump, but that may be due to its point
after/during system shutdown, e.g. ps(1) didn't work.
Still that should give some start for poking around...''
- Hubert