Subject: Re: Getting "TLB IPI rendezvous failed..."
To: Stephan Uphoff <ups@tree.com>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 01/25/2005 16:52:38
On Sat, Jan 22, 2005 at 09:21:59AM -0500, Stephan Uphoff wrote:
> Could you try the attached patch?
> Please make sure that all your com devices show up.

OK, I finally got around to try it and have interesting results.
First, I tried it yesterday, but the RAID parity was not clean, so raidframe
didn't use both disks. I couldn't get the box to panic.
I tried again today with a clean parity, I got the panic as expected.

I added attitional debug printfs, see the attached patch.

Here is the output:
CPU 0 interrupt level 0xd pending 0x0 depth 1
panic: TLB IPI ...
CPU 0 interrupt level 0xd pending 0x2000c400 depth 1
CPU 1 interrupt level 0x0 pending 0x0 depth 0

So, if I got it right, CPU0 isn't at splipi() at this point, only splhigh().
This isn't the reason why it doesn't get the IPI. Maybe interrupts are
completely disabled (via cli()0 here ?

CPU 1 stack trace:
panic
tlb_shootnow
pmap_kremove
uvm_pagermapout
genfs_getpage
ufs_balloc
ffs_write
vn_write
dofilewrite
sys_write

CPU 0 stack trace:
_cpu_simple_lock
_kernel_lock_try
x86_softintlock
Xsoftserial

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--