NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/59339: heartbeat watchdog fires since 10.99.14
The following reply was made to PR kern/59339; it has been noted by GNATS.
From: Patrick Welche <prlw1%welche.eu@localhost>
To: gnats-bugs%netbsd.org@localhost, matthew green <mrg%eterna23.net@localhost>
Cc:
Subject: Re: kern/59339: heartbeat watchdog fires since 10.99.14
Date: Tue, 22 Apr 2025 10:29:26 +0100
On Mon, Apr 21, 2025 at 10:00:02PM +0000, matthew green via gnats wrote:
> The following reply was made to PR kern/59339; it has been noted by GNATS.
>
> From: matthew green <mrg%eterna23.net@localhost>
> To: gnats-bugs%netbsd.org@localhost, prlw1%cam.ac.uk@localhost
> Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
> netbsd-bugs%netbsd.org@localhost
> Subject: re: kern/59339: heartbeat watchdog fires since 10.99.14
> Date: Tue, 22 Apr 2025 07:54:55 +1000
>
> > System panicked: cpu0: softints stuck for 16 seconds
>
> this means cpu0 is locked up, and some other cpu detected it and
> crashed. the stack below is not the interesting cpu, but you
> found the relevant LWPs to inspect:
>
> > crash> bt
> > end() at 0
> > kern_reboot() at kern_reboot+0x93
> > vpanic() at vpanic+0x16b
> > panic() at vprintf
> > heartbeat() at heartbeat+0x1f2
> > hardclock() at hardclock+0x9c
> > Xresume_lapic_ltimer() at Xresume_lapic_ltimer+0x1e
> > --- interrupt ---
> > mutex_spin_exit() at mutex_spin_exit+0x5a
> > callout_softclock() at callout_softclock+0xad
> > softint_dispatch() at softint_dispatch+0x8f
> > crash> ps
> > PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> > 2917 > 2917 7 0 8060000 ffff8052e4a14000 tar
> > 0 > 5 7 0 200 ffff8055abee1c00 softclk/0
>
> can you do "bt/a ffff8052e4a14000" and "bt/a ffff8055abee1c00"?
tar:
crash> bt/a ffff8052e4a14000
crash: _kvm_kvatop(7f7fffffe068)
crash: kvm_read(0x7f7fffffe068, 8): invalid translation (invalid level 4 PDE)
trace: pid 2917 lid 2917
softclk:
crash> bt/a ffff8055abee1c00
trace: pid 0 lid 5 at 0xffffc404a4608f90
callout_softclock() at callout_softclock+0xad
softint_dispatch() at softint_dispatch+0x8f
> or with the other crash, any process on the cpu reported (always
> cpu0, i think?) with the ">" state like above (ie, running.)
cron:
crash> bt/a ffff81f02e1a7800
trace: pid 1266 lid 1266 at 0xffff9704b6f54e90
sys___nanosleep50() at sys___nanosleep50+0x46
syscall() at syscall+0x95
--- syscall (number 430) ---
syscall+0x95:
X:
crash> bt/a ffff81f026788c00
trace: pid 992 lid 992 at 0xffff9704b65847a0
uvm_pglistalloc() at uvm_pglistalloc+0x9de
_bus_dmamem_alloc_range.constprop.0() at _bus_dmamem_alloc_range.constprop.0+0x6
4
bus_dmamem_alloc() at bus_dmamem_alloc+0x78
i915_gem_object_get_pages_internal() at i915_gem_object_get_pages_internal+0xad
____i915_gem_object_get_pages() at ____i915_gem_object_get_pages+0x1f
__i915_gem_object_get_pages() at __i915_gem_object_get_pages+0x45
pool_active() at pool_active+0x68
i915_active_acquire() at i915_active_acquire+0x7d
intel_engine_get_pool() at intel_engine_get_pool+0x223
i915_gem_do_execbuffer() at i915_gem_do_execbuffer+0xc71
i915_gem_execbuffer2_ioctl() at i915_gem_execbuffer2_ioctl+0x1be
drm_ioctl() at drm_ioctl+0x241
drm_ioctl_shim() at drm_ioctl_shim+0x30
sys_ioctl() at sys_ioctl+0x4ae
syscall() at syscall+0x95
--- syscall (number 54) ---
syscall+0x95:
xdm:
crash> bt/a ffff81f027510400
trace: pid 857 lid 857 at 0x8
_KERNEL_OPT_DDB_HISTORY_SIZE() at _KERNEL_OPT_DDB_HISTORY_SIZE+0x46
crash: _kvm_kvatop(10)
crash: kvm_read(0x10, 8): invalid translation (invalid level 4 PDE)
softclk:
crash> bt/a ffff81f020f56400
trace: pid 0 lid 27 at 0x680692b5
__LARGE_PAGE_SIZE() at 6aa46bf
crash: _kvm_kvatop(680692bd)
crash: kvm_read(0x680692bd, 8): invalid translation (invalid level 4 PDE)
Home |
Main Index |
Thread Index |
Old Index