NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-xen/58561 (panic: kernel diagnostic assertion "x86_read_psl() == 0" failed: file "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581)



The following reply was made to PR port-xen/58561; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: port-xen-maintainer%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost,
        gnats-admin%netbsd.org@localhost, riastradh%NetBSD.org@localhost,
        campbell+netbsd%mumble.net@localhost, cherry%NetBSD.org@localhost
Subject: Re: port-xen/58561 (panic: kernel diagnostic assertion
 "x86_read_psl() == 0" failed: file
 "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581)
Date: Thu, 12 Jun 2025 17:12:18 +0200

 Hello,
 sorry for taking so long for reply
 
 
 On Tue, May 13, 2025 at 03:30:15PM +0000, riastradh%NetBSD.org@localhost wrote:
 > Synopsis: panic: kernel diagnostic assertion "x86_read_psl() == 0" failed: file "/home/netbsd/10/src/sys/arch/x86/x86/pmap.c", line 3581
 > 
 > Responsible-Changed-From-To: port-xen-maintainer->bouyer
 > Responsible-Changed-By: riastradh%NetBSD.org@localhost
 > Responsible-Changed-When: Tue, 13 May 2025 15:30:14 +0000
 > Responsible-Changed-Why:
 > bouyer, can you take a look?
 > 
 > +cc cherry, who added the assertion back in 2011 with the cherry-xenmp
 > merge.
 > 
 > It's possible this is just some code path that does x86_disable_intr
 > without a necessary x86_read/write_psl around it to save and restore
 > the interrupt-disabled flag.  But I think we've only seen it on Xen
 > so far (see also dup https://gnats.netbsd.org/57543), which might help
 > to narrow it down.
 > 
 > This #ifndef XENPV x86_disable/enable_intr looks suspicious but I have
 > only superficially skimmed it and I have no idea what's going on:
 > 
 > https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/trap.c?r=1.129#554
 > 
 > (This bug has been biting mollari a lot lately, happened again today.)
 
 I've seen this too but only once in a (long) while. The last one
 I found in my logs was last september.
 
 I've no idea why the x86_disable/enable_intr in trap() is #ifndef XENPV.
 This was added by ad@ in trap.c 1.46 (in Apr 2008) as part of the
 kernel preemption work. AFAIK trap() is always called with events
 enabled on Xen, so I can't see why Xen wouldn't need x86_disable_intr()
 when bare metal needs it.
 
 I'm now testing the attached patch; both amd64 and i386 domUs have
 passed an anita run. I've installed in on the dom0 running the daily Xen
 tests (https://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/);
 it's in the middle of 2 anita runs. But I don't think I've ever seen
 this KASSERT fire on this host.
 
 If mollari is hitting this more often than what I'm seeing maybe it's
 worth testing it there in a few days ?
 
 Index: sys/arch/amd64/amd64/trap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/amd64/amd64/trap.c,v
 retrieving revision 1.128
 diff -u -p -u -r1.128 trap.c
 --- sys/arch/amd64/amd64/trap.c	5 Sep 2020 07:26:37 -0000	1.128
 +++ sys/arch/amd64/amd64/trap.c	12 Jun 2025 14:58:17 -0000
 @@ -514,6 +514,10 @@ pagefltcommon:
  			goto we_re_toast;
  		}
  #endif
 +#ifdef XENPV
 +		/* Check to see if interrupts are enabled (ie; no events are masked) */
 +		KASSERT(x86_read_psl() == 0);
 +#endif
  		/* Fault the original page in. */
  		onfault = pcb->pcb_onfault;
  		pcb->pcb_onfault = NULL;
 @@ -552,17 +556,13 @@ pagefltcommon:
  				 * the copy functions, and so visible
  				 * to cpu_kpreempt_exit().
  				 */
 -#ifndef XENPV
  				x86_disable_intr();
 -#endif
  				l->l_nopreempt--;
  				if (l->l_nopreempt > 0 || !l->l_dopreempt ||
  				    pfail) {
  					return;
  				}
 -#ifndef XENPV
  				x86_enable_intr();
 -#endif
  				/*
  				 * If preemption fails for some reason,
  				 * don't retry it.  The conditions won't
 Index: sys/arch/i386/i386/trap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/i386/i386/trap.c,v
 retrieving revision 1.308
 diff -u -p -u -r1.308 trap.c
 --- sys/arch/i386/i386/trap.c	20 Aug 2022 23:48:50 -0000	1.308
 +++ sys/arch/i386/i386/trap.c	12 Jun 2025 14:58:17 -0000
 @@ -632,6 +632,10 @@ faultcommon:
  			goto we_re_toast;
  		}
  #endif
 +#ifdef XENPV
 +		/* Check to see if interrupts are enabled (ie; no events are masked) */
 +		KASSERT(x86_read_psl() == 0);
 +#endif
  		/* Fault the original page in. */
  		onfault = pcb->pcb_onfault;
  		pcb->pcb_onfault = NULL;
 @@ -670,17 +674,13 @@ faultcommon:
  				 * the copy functions, and so visible
  				 * to cpu_kpreempt_exit().
  				 */
 -#ifndef XENPV
  				x86_disable_intr();
 -#endif
  				l->l_nopreempt--;
  				if (l->l_nopreempt > 0 || !l->l_dopreempt ||
  				    pfail) {
  					return;
  				}
 -#ifndef XENPV
  				x86_enable_intr();
 -#endif
  				/*
  				 * If preemption fails for some reason,
  				 * don't retry it.  The conditions won't
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 


Home | Main Index | Thread Index | Old Index