tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sparse dumps (was: WAPL panic)



On Nov 9, 2012, at 08:01, Chuck Silvers <chuq%chuq.com@localhost> wrote:

> On Wed, Nov 07, 2012 at 02:22:49PM +0100, Edgar Fu wrote:
>>> Try to get a sparse dump via machdep.sparse_dump=1
>> How long is that supposed to take?
>> It said "dump", paused for a few seconds, then counted from 44 down to 38 
>> and 
>> then nothing happened for minutes. Until I hit the virtual reset button.
> 
> I tried triggering a sparse dump (with "reboot -qd") on amd64
> and after a number of tries I did see the hang during the dump.
> but even when it doesn't hang, the resulting sparse dump is not valid:
> 
> savecore: kvm_read: invalid translation (invalid level 4 PDE)
> 
> sparse dumps appear to be a bit too sparse.
> 
> after I fixed that (and the problem that causes the kernel to spew
> "pmap_kenter_pa: mapping already present"), the next problem was that savecore
> generates a useless kernel image file, so you need to ignore the one
> from savecore and use the kernel image you actually booted.  this isn't
> specific to sparse dumps, it happens with both normal and sparse dumps.
> 
> but once I get past all that, sparse dumps work for me on amd64.
> 
> ... I later tried triggering a dump from ddb with "reboot 0x104"
> to make sure that my fix for the "mapping already present" thing
> would work in this context as well (since the last attempt to fix that
> resulted in a different hang), and I found that rebooting from ddb
> currently always hangs.  I traced it as far as cpu_shutdown(),
> and it's not surprising that the xcalls from that also cause problems.
> I'm inclined to have pmf_system_shutdown() return without doing anything
> if panicstr is set, since the context in which this is called could cause
> a hang for any driver shutdown hook.  does anyone have any other ideas
> on what to do about this?
> 
> the attached patch fixes the amd64 kernel problems with sparse dumps for me,
> could you give that a try?

I have tested your patches for NetBSD-current on VMware Fusion (under Mac OSX). 
Breaking into ddb and entering "reboot 0x104" results in a good core dump. As 
you note, the kernel copy is invalid.

Thanks for the patches! I cannot remember the last time I was able to get a 
workable core dump on amd64.

Regards,
Sverre

PS "vmstat -M netbsd.0.core -N /netbsd" results in
        vmstat: can't dereference kptr 0x7f7fffffd780
        vmstat: invalid translation (invalid level 4 PDE)
adding specific options, e.g., -e , work fine.

PPS Is it now safe to enable core dumps on systems where the dump partition is 
a sub-partition of a  raidframe RAID 1 partition? This used to warned against 
in the old raidframe documentation but the warnings are gone in recent versions.


Home | Main Index | Thread Index | Old Index