Subject: bin/6832: savecore deficiencies
To: None <gnats-bugs@gnats.netbsd.org>
From: None <bgrayson@ece.utexas.edu>
List: netbsd-bugs
Date: 01/17/1999 22:57:30
>Number:         6832
>Category:       bin
>Synopsis:       savecore could be better
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people (Utility Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jan 17 21:05:01 1999
>Last-Modified:
>Originator:     Brian Grayson
>Organization:
	Parallel and Distributed Systems
	Electrical and Computer Engineering
	The University of Texas at Austin
>Release:        Jan 15, 1998
>Environment:

>Description:
	As a NetBSD-helper-wannabe, I've been quite frustrated
	with how savecore does and doesn't work.  There are
	several things that could be better!

	1.  There is no reference to a crash(8) man page.  Each
	architecture should have a crash man page talking about
	what occurs when crashes happen, how (in general terms)
	savecore saves the information for further perusal, and
	how to debug a dead kernel using the saved info (i.e.,
	print a backtrace, do a ps).  
	
	The hp300 and vax archs have crash(8) man pages, which
	would be a good start for the other archs.  panic(9)
	and other things should also x-ref the crash(8) man page.

	2.  Some of the exit conditions don't tell much
	information, even with -v.  For example, the error message
	"/dev/wd0e: Device busy" could be something like "Failure
	while attempting to open dump device /dev/wd0e:  Device
	busy".  (Also, this particular failure is one failure
	mode for when /netbsd and the crashed kernel aren't the
	same -- wd0e is an FFS file system, not a swap partition.
	A note like "Check that /netbsd is your running
	kernel, and the kernel xxx passed via -N (if any) is the
	kernel that caused the panic" would be nice.  There are
	several other common exit points that should have similar
	messages about kernel names.)

	3.  There is no -N-style option to deal with the
	currently-running kernel being different from /netbsd.
	For example, I currently am running /netbsd-test3, with a
	crashdump from /netbsd-test3.  We need two -N-style
	options, one for the current kernel, and the
	already-existing -N option for the dead kernel.

	4.  savecore's failure to save when there is a crash is too
	quiet.  Perhaps each exit(1) call should be replaced
	with something that prints out a more noticeable message
	(*** savecore failure:  xxx)?
>How-To-Repeat:
>Fix:
	Some of the above are over my head.  I'll try to send in
	patches as I write them over the next few weeks, if no
	one else beats me to it.
>Audit-Trail:
>Unformatted: