tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [10.99.12] Panic (softints stuck)



Taylor R Campbell a écrit :
>> Date: Sun, 16 Feb 2025 16:24:20 +0100
>> From: BERTRAND Joël <joel.bertrand%systella.fr@localhost>
>>
>> 	System runs a kernel with sys/arch/amd64/amd64/machdep.c
>> rev. 1.370. Panic occurs this morning and.. no crash dump in /var/crash...
> 
> What does /var/run/rc.log say about savecore?

	Only these lines :

[running /etc/rc.d/savecore]
Checking for core dump...
savecore: no core dump

> Please test crash dumps in your configuration before you spend weeks
> waiting for the symptom to randomly manifest again!  Test a similar
> configuration, say in a VM, if you absolutely cannot take this machine
> down for a test at a predictable time -- otherwise you'll continue
> having to take it down at unpredictable times anyway.
> 
>> Feb 16 14:57:49 legendre /netbsd: [ 2118436.0016171] dumping to dev 18,1
>> (offset=253015, size=4162677):
> 
> Do you log the serial console output?  I would be curious to see if
> anything was printed after that.

	I don't have serial console on this server.

> The last time around, in PR 59024, you shared these two lines of
> output:
> 
>> [ 417509.006761] dumping to dev 18,1 (offset=253015, size=4162677):
>> [ 417509.006761] dump device bad

Feb 16 14:57:49 legendre /netbsd: [ 2118436.0016171] dumping to dev 18,1
(offset=253015, size=4162677):
Feb 16 14:57:49 legendre /netbsd: [ 2118436.0016171] dump 1.0000040]
acpi0: SCI interrupting at int 9
Feb 16 14:57:49 legendre /netbsd: [   1.0000040][   1.0000000] Copyright
(c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,

	Last line begins with "dump", maybe end of line contains "device bad".

> The second line of output is missing in what you just quoted this
> time, which is curious.  I also added a lot of diagnostics in rev.
> 1.371 that should appear between those two lines and perhaps shed some
> light on why the dumps are failing (tracked in PR kern/59024 `dump
> fails on raid0b' <https://gnats.NetBSD.org/59024>).  If you're running
> with 1.370 then you're already on current, so I strongly recommend you
> update to current with 1.371 _and test crash dumps_.

	I have updated my kerel tree and running kernel was built with machdep
1.371.

	Kernel is 10.99.12 but userland is only 10.0. Should I update userland
to 10.99.12 ?

	I will try to force panic, but it is not easy. As this server acts as
main server (tftp, nfs, nis, swap...) for a lot of diskless
workstations, I have to stop all workstations to test.

	Regards,

	JB

Attachment: signature.asc
Description: OpenPGP digital signature



Home | Main Index | Thread Index | Old Index