Taylor R Campbell a écrit : >> Date: Sun, 16 Feb 2025 16:24:20 +0100 >> From: BERTRAND Joël <joel.bertrand%systella.fr@localhost> >> >> System runs a kernel with sys/arch/amd64/amd64/machdep.c >> rev. 1.370. Panic occurs this morning and.. no crash dump in /var/crash... > > What does /var/run/rc.log say about savecore? Only these lines : [running /etc/rc.d/savecore] Checking for core dump... savecore: no core dump > Please test crash dumps in your configuration before you spend weeks > waiting for the symptom to randomly manifest again! Test a similar > configuration, say in a VM, if you absolutely cannot take this machine > down for a test at a predictable time -- otherwise you'll continue > having to take it down at unpredictable times anyway. > >> Feb 16 14:57:49 legendre /netbsd: [ 2118436.0016171] dumping to dev 18,1 >> (offset=253015, size=4162677): > > Do you log the serial console output? I would be curious to see if > anything was printed after that. I don't have serial console on this server. > The last time around, in PR 59024, you shared these two lines of > output: > >> [ 417509.006761] dumping to dev 18,1 (offset=253015, size=4162677): >> [ 417509.006761] dump device bad Feb 16 14:57:49 legendre /netbsd: [ 2118436.0016171] dumping to dev 18,1 (offset=253015, size=4162677): Feb 16 14:57:49 legendre /netbsd: [ 2118436.0016171] dump 1.0000040] acpi0: SCI interrupting at int 9 Feb 16 14:57:49 legendre /netbsd: [ 1.0000040][ 1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, Last line begins with "dump", maybe end of line contains "device bad". > The second line of output is missing in what you just quoted this > time, which is curious. I also added a lot of diagnostics in rev. > 1.371 that should appear between those two lines and perhaps shed some > light on why the dumps are failing (tracked in PR kern/59024 `dump > fails on raid0b' <https://gnats.NetBSD.org/59024>). If you're running > with 1.370 then you're already on current, so I strongly recommend you > update to current with 1.371 _and test crash dumps_. I have updated my kerel tree and running kernel was built with machdep 1.371. Kernel is 10.99.12 but userland is only 10.0. Should I update userland to 10.99.12 ? I will try to force panic, but it is not easy. As this server acts as main server (tftp, nfs, nis, swap...) for a lot of diskless workstations, I have to stop all workstations to test. Regards, JB
Attachment:
signature.asc
Description: OpenPGP digital signature