NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-xen/59546: kernel wedged until NMI and then happily proceeds



>Number:         59546
>Category:       port-xen
>Synopsis:       kernel wedged until NMI and then happily proceeds
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-xen-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 23 12:50:00 +0000 2025
>Originator:     Taylor R Campbell
>Release:        netbsd-10
>Organization:
The NapBSD Foundation
>Environment:
NetBSD mollari.NetBSD.org 10.1_STABLE NetBSD 10.1_STABLE (amd64-DOMU_SERVER) #3: Sat Jun 21 13:32:10 UTC 2025  spz%franklin.NetBSD.org@localhost:/home/netbsd/10/amd64/obj/sys/arch/amd64/compile/amd64-DOMU_SERVER amd64

Xen kernel: 4.18 (20231116)
>Description:
Twice in the past 24h, a Xen domU has become unresponsive over the network and console -- not even to the hw.cnmagic key sequence over the console to drop into ddb.

The system does respond to `xl trigger <domid> nmi' and drops into ddb, where stack traces on all CPUs look reasonable (one in a userland process, three in idle loop).  After continuing from ddb, it's fine -- except that it thinks no time has passed after hours of being wedged, which tells me that the hardclock timer interrupt has not been firing.

Similarly, since the console was unresponsive, I infer that the xencons interrupt was not firing, since it synchronously evaluates hw.cnmagic.

So I suspect something is awry with splfoo/splx, or x86_read_psl/x86_disable_intr/x86_write_psl, or something about interrupt delivery in the Xen kernel.
>How-To-Repeat:
no idea
>Fix:
Yes, please!



Home | Main Index | Thread Index | Old Index