NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-xen/58561 (panic: kernel diagnostic assertion "x86_read_psl() == 0" failed: file "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581)
The following reply was made to PR port-xen/58561; it has been noted by GNATS.
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: port-xen-maintainer%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost,
gnats-admin%netbsd.org@localhost, riastradh%NetBSD.org@localhost,
campbell+netbsd%mumble.net@localhost, cherry%NetBSD.org@localhost
Subject: Re: port-xen/58561 (panic: kernel diagnostic assertion
"x86_read_psl() == 0" failed: file
"/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581)
Date: Thu, 12 Jun 2025 17:12:18 +0200
Hello,
sorry for taking so long for reply
On Tue, May 13, 2025 at 03:30:15PM +0000, riastradh%NetBSD.org@localhost wrote:
> Synopsis: panic: kernel diagnostic assertion "x86_read_psl() == 0" failed: file "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581
>
> Responsible-Changed-From-To: port-xen-maintainer->bouyer
> Responsible-Changed-By: riastradh%NetBSD.org@localhost
> Responsible-Changed-When: Tue, 13 May 2025 15:30:14 +0000
> Responsible-Changed-Why:
> bouyer, can you take a look?
>
> +cc cherry, who added the assertion back in 2011 with the cherry-xenmp
> merge.
>
> It's possible this is just some code path that does x86_disable_intr
> without a necessary x86_read/write_psl around it to save and restore
> the interrupt-disabled flag. But I think we've only seen it on Xen
> so far (see also dup https://gnats.netbsd.org/57543), which might help
> to narrow it down.
>
> This #ifndef XENPV x86_disable/enable_intr looks suspicious but I have
> only superficially skimmed it and I have no idea what's going on:
>
> https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/trap.c?r=1.129#554
>
> (This bug has been biting mollari a lot lately, happened again today.)
I've seen this too but only once in a (long) while. The last one
I found in my logs was last september.
I've no idea why the x86_disable/enable_intr in trap() is #ifndef XENPV.
This was added by ad@ in trap.c 1.46 (in Apr 2008) as part of the
kernel preemption work. AFAIK trap() is always called with events
enabled on Xen, so I can't see why Xen wouldn't need x86_disable_intr()
when bare metal needs it.
I'm now testing the attached patch; both amd64 and i386 domUs have
passed an anita run. I've installed in on the dom0 running the daily Xen
tests (https://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/);
it's in the middle of 2 anita runs. But I don't think I've ever seen
this KASSERT fire on this host.
If mollari is hitting this more often than what I'm seeing maybe it's
worth testing it there in a few days ?
Index: sys/arch/amd64/amd64/trap.c
===================================================================
RCS file: /cvsroot/src/sys/arch/amd64/amd64/trap.c,v
retrieving revision 1.128
diff -u -p -u -r1.128 trap.c
--- sys/arch/amd64/amd64/trap.c 5 Sep 2020 07:26:37 -0000 1.128
+++ sys/arch/amd64/amd64/trap.c 12 Jun 2025 14:58:17 -0000
@@ -514,6 +514,10 @@ pagefltcommon:
goto we_re_toast;
}
#endif
+#ifdef XENPV
+ /* Check to see if interrupts are enabled (ie; no events are masked) */
+ KASSERT(x86_read_psl() == 0);
+#endif
/* Fault the original page in. */
onfault = pcb->pcb_onfault;
pcb->pcb_onfault = NULL;
@@ -552,17 +556,13 @@ pagefltcommon:
* the copy functions, and so visible
* to cpu_kpreempt_exit().
*/
-#ifndef XENPV
x86_disable_intr();
-#endif
l->l_nopreempt--;
if (l->l_nopreempt > 0 || !l->l_dopreempt ||
pfail) {
return;
}
-#ifndef XENPV
x86_enable_intr();
-#endif
/*
* If preemption fails for some reason,
* don't retry it. The conditions won't
Index: sys/arch/i386/i386/trap.c
===================================================================
RCS file: /cvsroot/src/sys/arch/i386/i386/trap.c,v
retrieving revision 1.308
diff -u -p -u -r1.308 trap.c
--- sys/arch/i386/i386/trap.c 20 Aug 2022 23:48:50 -0000 1.308
+++ sys/arch/i386/i386/trap.c 12 Jun 2025 14:58:17 -0000
@@ -632,6 +632,10 @@ faultcommon:
goto we_re_toast;
}
#endif
+#ifdef XENPV
+ /* Check to see if interrupts are enabled (ie; no events are masked) */
+ KASSERT(x86_read_psl() == 0);
+#endif
/* Fault the original page in. */
onfault = pcb->pcb_onfault;
pcb->pcb_onfault = NULL;
@@ -670,17 +674,13 @@ faultcommon:
* the copy functions, and so visible
* to cpu_kpreempt_exit().
*/
-#ifndef XENPV
x86_disable_intr();
-#endif
l->l_nopreempt--;
if (l->l_nopreempt > 0 || !l->l_dopreempt ||
pfail) {
return;
}
-#ifndef XENPV
x86_enable_intr();
-#endif
/*
* If preemption fails for some reason,
* don't retry it. The conditions won't
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index