Subject: Re: kern/25285: i386 MP panic: TLB IPI rendezvous failed (mask 1)
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Paul Dokas <dokas@cs.umn.edu>
List: current-users
Date: 06/04/2004 21:07:57
On Fri, 04 Jun 2004 22:49:01 +0900, YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp> wrote:
> > dokas@cs.umn.edu said:
> > > Anyone know why this is happening?
> > 
> > IPIs can get lost appearently.
> > I don't fully understand how this can happen, but changing
> > the code to be more conservative helped on my dual-Opteron.
> 
> because spllower() doesn't check ipending and update ilevel atomically,
> interrupt priority inversion, which is a serious problem for ipis,
> can happen.
> 
> the following is a simple fix, although i'm not sure it's the best one.
> 
> YAMAMOTO Takashi
> 
> Index: arch/x86/include/intr.h
> ===================================================================
> --- arch/x86/include/intr.h	(revision 599)
> +++ arch/x86/include/intr.h	(working copy)
> @@ -158,16 +158,21 @@ static __inline void
>  spllower(int nlevel)
>  {
>  	struct cpu_info *ci = curcpu();
> +	u_int32_t imask;
> +	u_long psl;
>  
>  	__splbarrier();
> -	/*
> -	 * Since this should only lower the interrupt level,
> -	 * the XOR below should only show interrupts that
> -	 * are being unmasked.
> -	 */
> -	ci->ci_ilevel = nlevel;
> -	if (ci->ci_ipending & IUNMASK(ci,nlevel))
> +
> +	imask = IUNMASK(ci, nlevel);
> +	psl = read_psl();
> +	disable_intr();
> +	if (ci->ci_ipending & imask) {
>  		Xspllower(nlevel);
> +		/* Xspllower does enable_intr() */
> +	} else {
> +		ci->ci_ilevel = nlevel;
> +		write_psl(psl);
> +	}
>  }
>  
>  /*


I've been running with this patch for over 6 hours now with a very high
load and my machine hasn't crashed.  I'll load up the machine and let
it run for the weekend.  Thank you for looking in this problem.

BTW, my machine did finally crash this morning, before I put this patch in.
That shows that the BIOS update and change to "Sequential Memory Access"
was not enough to solve this problem.  Hopefully, this patch will be
what's needed.

Paul
-- 
Paul Dokas                                            dokas@cs.umn.edu
======================================================================
Don Juan Matus:  "an enigma wrapped in mystery wrapped in a tortilla."