NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/59886: crash dumps are terrible
>Number: 59886
>Category: kern
>Synopsis: crash dumps are terrible
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 03 23:00:00 +0000 2026
>Originator: Taylor R Campbell
>Release: current, 11, 10, 9, ...
>Organization:
The Core Don't, Inc.
>Environment:
>Description:
NetBSD crash dumps are terrible. Here is a litany of issues
that should all be fixed:
1. savecore(8) depends on the _running_ kernel's configuration
to read out what dumpdev is.
It should be able to save a core from any specified dumpdev
without asking the running kernel so you can:
(a) boot a broken kernel,
(b) crash,
(c) reboot into a working kernel,
(d) savecore from the broken kernel,
even if the working and broken kernel have different
configurations and default dumpdevs and what not.
2. The kernel core dump format is undocumented and apparently
unreliable, because it often fails in mysterious ways like:
[running /etc/rc.d/savecore]
Checking for core dump...
savecore: msgbuf magic incorrect (706050403020100 != 63061)
savecore: reboot after panic: kernel diagnostic assertion "uvmexp.swpgonly > 0" failed: file "/zfs/source/src/sys/uvm/uvm_anon.c", line 175
savecore: system went down at Sat Jan 3 23:15:44 2026
savecore: writing compressed core to /var/crash/netbsd.2.core.gz
8086 M
...
540 K
savecore: writing compressed kernel to /var/crash/netbsd.2.gz
savecore: kvm_read ksyms: _kvm_kvatop(ffffc68022f77000)
savecore: (null): Bad address
/etc/rc.d/savecore exited with code 1
3. The kernel doesn't compress memory as it dumps so it's very
slow and requires an unreasonably large dumpdev to work.
4. If dumping core doesn't work, the only fallback is to hope
that the panic and stack trace are preserved in dmesg on
reboot, which often isn't the case -- especially if the
system hangs and it is forcibly powered _off_ before the
operator powers it back on. It should be able to take
advantage of things like UEFI storage or ACPI APEI ERST
storage to store diagnostic information about the crash
dump.
5. Preserving dmesg on reboot also often doesn't work if the
previous and current kernel are different and have different
parameters; presumably it is not adequately marked in
memory.
>How-To-Repeat:
watch users struggle to get diagnostics out of crashes for PRs
>Fix:
Yes, please!
Home |
Main Index |
Thread Index |
Old Index