re: kern/59339: heartbeat watchdog fires since 10.99.14

To: gnats-bugs%netbsd.org@localhost, prlw1%cam.ac.uk@localhost
Subject: re: kern/59339: heartbeat watchdog fires since 10.99.14
From: matthew green <mrg%eterna23.net@localhost>
Date: Tue, 22 Apr 2025 07:54:55 +1000

> System panicked: cpu0: softints stuck for 16 seconds

this means cpu0 is locked up, and some other cpu detected it and
crashed.  the stack below is not the interesting cpu, but you
found the relevant LWPs to inspect:

> crash> bt
> end() at 0
> kern_reboot() at kern_reboot+0x93
> vpanic() at vpanic+0x16b
> panic() at vprintf
> heartbeat() at heartbeat+0x1f2
> hardclock() at hardclock+0x9c
> Xresume_lapic_ltimer() at Xresume_lapic_ltimer+0x1e
> --- interrupt ---
> mutex_spin_exit() at mutex_spin_exit+0x5a
> callout_softclock() at callout_softclock+0xad
> softint_dispatch() at softint_dispatch+0x8f
> crash> ps
> PID     LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
> 2917 > 2917 7   0   8060000   ffff8052e4a14000                tar
> 0    >    5 7   0       200   ffff8055abee1c00          softclk/0

can you do "bt/a ffff8052e4a14000" and "bt/a ffff8055abee1c00"?

or with the other crash, any process on the cpu reported (always
cpu0, i think?) with the ">" state like above (ie, running.)

i expect the above will show that softclk/0 has fast switched
the tar process (ie, softclk/0 bt may end up being the same as
tar with some additional frames.)  normally, there should only
be one active LWP per cpu, but fast softints do.

thanks.


.mrg.

References:
- kern/59339: heartbeat watchdog fires since 10.99.14
  - From: prlw1

Prev by Date: NetBSD Nightly Trouble Ticket Report
Next by Date: re: kern/59339: heartbeat watchdog fires since 10.99.14
Previous by Thread: kern/59339: heartbeat watchdog fires since 10.99.14
Next by Thread: re: kern/59339: heartbeat watchdog fires since 10.99.14
Indexes:

Home | Main Index | Thread Index | Old Index