Subject: Re: kern/25285: i386 MP panic: TLB IPI rendezvous failed (mask 1)
To: None <M.Drochner@fz-juelich.de>
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
List: current-users
Date: 06/04/2004 22:49:01
hi,

> dokas@cs.umn.edu said:
> > Anyone know why this is happening?
> 
> IPIs can get lost appearently.
> I don't fully understand how this can happen, but changing
> the code to be more conservative helped on my dual-Opteron.

because spllower() doesn't check ipending and update ilevel atomically,
interrupt priority inversion, which is a serious problem for ipis,
can happen.

the following is a simple fix, although i'm not sure it's the best one.

YAMAMOTO Takashi

Index: arch/x86/include/intr.h
===================================================================
--- arch/x86/include/intr.h	(revision 599)
+++ arch/x86/include/intr.h	(working copy)
@@ -158,16 +158,21 @@ static __inline void
 spllower(int nlevel)
 {
 	struct cpu_info *ci = curcpu();
+	u_int32_t imask;
+	u_long psl;
 
 	__splbarrier();
-	/*
-	 * Since this should only lower the interrupt level,
-	 * the XOR below should only show interrupts that
-	 * are being unmasked.
-	 */
-	ci->ci_ilevel = nlevel;
-	if (ci->ci_ipending & IUNMASK(ci,nlevel))
+
+	imask = IUNMASK(ci, nlevel);
+	psl = read_psl();
+	disable_intr();
+	if (ci->ci_ipending & imask) {
 		Xspllower(nlevel);
+		/* Xspllower does enable_intr() */
+	} else {
+		ci->ci_ilevel = nlevel;
+		write_psl(psl);
+	}
 }
 
 /*