tech-kern: Re: Getting "TLB IPI rendezvous failed..."

Subject: Re: Getting "TLB IPI rendezvous failed..."
To: Frederick Bruckman <fredb@immanent.net>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 12/23/2004 19:22:13

On Thu, Dec 23, 2004 at 12:56:26AM -0600, Frederick Bruckman wrote:
> On Wed, 22 Dec 2004, Manuel Bouyer wrote:
> >
> >I see similar panics, see kern/28541. You could try to see where the others
> >CPU are with 'mach cpu #' followed by 't', to make sure mi_switch is
> >also involved in the panic in your case.
> 
> Hmm, got a different one...
> 
> l->l_cpu != curcpu() failed, file .../uvm_glue.c line 605
> db{6} t
> __assert
> uvm_swapout
> uvmpd_scan
> uvm_pageout
> db{6} machine cpu 0
> db{6} t
> acquire
> spinlock_acquire_count
> mi_switch
> ltsleep
> sbwait
> soreceive
> [more nfs stuff]

mi_switch, again

> 
> I should add that the kernel's built with "-momit-leaf-frame-pointer", 
> which is probably why part of the call chain appears to be missing.
> 
> So...
> 
> 1) It's only on i386?

I've seen it only on i386, but only on one of the 3 or 4 i386 SMP I have
running. In my case it seems the trigger is the amanda client, which makes
heavy use of pipes.

>
> 2) The general pattern seems to be that one cpu is at spipl(), waiting 
> for a lock, while the other cpu insists on doing something to the first 
> cpu, and has no way to back off? I wonder why it's only i386.

It looks like that, but we didn't find how the CPU going though mi_switch
would be at splipl() ...

> 
> Another thing I should mention, when I had kernel without options 
> DEBUG, DIAGNOSTIC, or DDB_ONPANIC=1, it would just seem to freeze, but 
> once, on an unattended freeze, it seemed to resolve all by itself after 
> a few hours (in a reboot).

I've never seen a freeze, only these IPI panics involving mi_swicth()

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--