Subject: Re: savecore_flags="-z"
To: None <woods@weird.com>
From: Eduardo E. Horvath <eeh@one-o.com>
List: tech-kern
Date: 12/02/1999 19:14:49
> When I do need swap space then I want it to be striped over several
> spindles (either in "hardware", or with multiple swap partitions), and
> it really doesn't matter how much real RAM I've got.
> 
> However if I do have 4GB of RAM and I do want to handle full crash dumps
> I'd be reluctant to put all that 4GB of swap/dump space on one spindle.

That is a problem with most large systems.  If you have a dozen disks
it makes more sense to swap across the whole bunch rather than to just
one.

> This suggests to me that a crash dump should be able to sequentially
> write to all of the active swap/dump partitions.  Can it do this now?

The problem with this approach is that you need to be sure that you
have the same set of swap partitions mounted in the same order for
both the dump and savecore or you won't get a usable core file.

It tends to be a bit delicate, especially now you can add and remove
swap dynamically.

> As for using "savecore -z", well I do think that it makes the most sense
> to compress crash dumps when saving them on the filesystem.  I would
> also suggest that anyone who needs to frequently analyzed their own
> crash dumps should specify a dump partition separate from their swap
> space and always read the dump directly off the raw disk (I'm assuming
> here of course that gdb won't have any trouble doing this -- I've never
> tried it myself in real life :-).

If you want to move the dump to another machine you still need to
extract it and put it on a filesystem.

> The other interesting question is whether or not the dump should be
> compressed while it is being written to the dump partition during the
> crash if indeed there's not enough space to store it all....

Solaris 8 solves the dump problem in two ways:

It only saves mapped kernel pages, so the dump is limited to the
active kernel VA.  This means that you can't debug process data from a
kernel core dump, but then that's seldom necessary.  If the problem is
really bad and might involve unmapped pages, then you need to debug it
on the system.  But then the process of creating a core dump can
corrupt what you're trying to debug, especially any driver structures
that might be involved in dumping, and any machine state that is not
in the dump such as device registers.  So this is not a major loss.

It compresses pages as they are saved to disk.  Apparently the data
tends to have large amounts of zeros and repetative data, so the
compression ratio is pretty good.  This is necessary on systems with
8+GB RAM that hang due to a kernel memory leak.  Your dump really does
end up to be 8+GB in this case.

We could probably do something similar.  Dump pages linearly by VA.
Change libkvm not to translate addresses through the MMU but use file
offsets directly.  Unmapped kernel pages may be an issue, but they
shouldn't cause all that much bloat.  Just write out a page of zeros,
and savecore can insert a hole.  It might even speed things up.

=========================================================================
Eduardo Horvath				eeh@netbsd.org
	"I need to find a pithy new quote." -- me