Subject: bin/6832: savecore deficiencies
To: None <email@example.com>
From: None <firstname.lastname@example.org>
Date: 01/17/1999 22:57:30
>Synopsis: savecore could be better
>Responsible: bin-bug-people (Utility Bug People)
>Arrival-Date: Sun Jan 17 21:05:01 1999
>Originator: Brian Grayson
Parallel and Distributed Systems
Electrical and Computer Engineering
The University of Texas at Austin
>Release: Jan 15, 1998
As a NetBSD-helper-wannabe, I've been quite frustrated
with how savecore does and doesn't work. There are
several things that could be better!
1. There is no reference to a crash(8) man page. Each
architecture should have a crash man page talking about
what occurs when crashes happen, how (in general terms)
savecore saves the information for further perusal, and
how to debug a dead kernel using the saved info (i.e.,
print a backtrace, do a ps).
The hp300 and vax archs have crash(8) man pages, which
would be a good start for the other archs. panic(9)
and other things should also x-ref the crash(8) man page.
2. Some of the exit conditions don't tell much
information, even with -v. For example, the error message
"/dev/wd0e: Device busy" could be something like "Failure
while attempting to open dump device /dev/wd0e: Device
busy". (Also, this particular failure is one failure
mode for when /netbsd and the crashed kernel aren't the
same -- wd0e is an FFS file system, not a swap partition.
A note like "Check that /netbsd is your running
kernel, and the kernel xxx passed via -N (if any) is the
kernel that caused the panic" would be nice. There are
several other common exit points that should have similar
messages about kernel names.)
3. There is no -N-style option to deal with the
currently-running kernel being different from /netbsd.
For example, I currently am running /netbsd-test3, with a
crashdump from /netbsd-test3. We need two -N-style
options, one for the current kernel, and the
already-existing -N option for the dead kernel.
4. savecore's failure to save when there is a crash is too
quiet. Perhaps each exit(1) call should be replaced
with something that prints out a more noticeable message
(*** savecore failure: xxx)?
Some of the above are over my head. I'll try to send in
patches as I write them over the next few weeks, if no
one else beats me to it.