port-i386: TLB IPI rendez-vous failed

Subject: TLB IPI rendez-vous failed
To: None <port-amd64@netbsd.org, port-i386@netbsd.org>
From: Matthias Drochner <M.Drochner@fz-juelich.de>
List: port-i386
Date: 03/11/2004 20:20:41

This is a multipart MIME message.

--==_Exmh_12407951937000
Content-Type: text/plain; charset=us-ascii


Hi -
under some load patterns involving simultanous floating-point,
I/O and process creation activity, by dual-Opteron box
panics with this message within minutes or hours.
The stack traceback looks always similar: CPU 1 (the secondary,
non interrupt-handling) like:

panic()
pmap_tlb_shootnow()
pmap_enter()
uvm_fault()
trap()

and the boot CPU like:

acquire()
lockmgr()
Xintr_ioapic_level11()

So the secondary CPU has grabbed the kernel lock, and the boot
CPU spins waiting for it. IPIs should get through anyway.
The bit corresponding to TLB flush is set in cpu0->ci_ipis,
only the processor obviously didn't get the interrupt or didn't
see that bit in ci_ipis.

My impression is that this results from IPIs getting lost if
coming in rapid succession. FPU sync operations are relatively
expensive, so it might happen that an FPU sync IPI is still is
progress when a TLB flush IPI is issued.
The appended patch makes my dual-amd64 run stable.
I don't quite understand how the assumed race condition looks
like exactly, perhaps someone has some more imagination:-)
The i386 code is identical appearently, so this might be an
issue there too.

best regards
Matthias



--==_Exmh_12407951937000
Content-Type: text/plain ; name="ipihdl.txt"; charset=us-ascii
Content-Description: ipihdl.txt
Content-Disposition: attachment; filename="ipihdl.txt"

--- vector.S.~1.3.~	Fri Feb 27 13:13:44 2004
+++ vector.S	Thu Mar 11 18:45:53 2004
@@ -305,7 +305,7 @@ IDTVEC(intr_lapic_ipi)
 	pushq	$0		
 	pushq	$T_ASTFLT
 	INTRENTRY		
-	movl	$0,_C_LABEL(local_apic)+LAPIC_EOI
+#	movl	$0,_C_LABEL(local_apic)+LAPIC_EOI
 	movl	CPUVAR(ILEVEL),%ebx
 	cmpl	$IPL_IPI,%ebx
 	jae	2f
@@ -315,6 +315,7 @@ IDTVEC(intr_lapic_ipi)
         sti
 	pushq	%rbx
 	call	_C_LABEL(x86_ipi_handler)
+	movl	$0,_C_LABEL(local_apic)+LAPIC_EOI
 	jmp	_C_LABEL(Xdoreti)
 2:
 	orl	$(1 << LIR_IPI),CPUVAR(IPENDING)

--==_Exmh_12407951937000--