Subject: Getting "TLB IPI rendezvous failed..."
To: None <tech-kern@NetBSD.org>
From: Frederick Bruckman <fredb@immanent.net>
List: tech-kern
Date: 12/20/2004 16:06:40
Hi,

I'm getting a fairly consistent panic on an SMP host running NetBSD 
2.0, comparing some large directories on two different fileservers. 
(One is NetBSD 2.0, one is FreeBSD 5.3, but I don't suppose that 
matters.) It looks something like this:


panic: TLB IPI rendezvous failed (mask 1)
  Stopped pid 1820 (diff)
db{6}> t
cpu_Debugger ...
pmap_tlb_shootdown ...
pmap_do_remove ... +0xc0
pmap_remove ... +0x27
ubc_alloc ... +0x38b
nfs_bioread ... +0x45f
VOP_READ ... +0x36
vn_read ... +0x9d
do_fileread ... +0x92
sys_read  ... +0x80
syscall_plain ... +0x17c
--- syscall (number 3) ---
db{6}> c
syncing disks... panic spinlock_switchcheck: CPU 6 has 1 spin locks


...so I don't get a core dump. ("sync" gives the same error.) I'm 
running several diffs on a handful of directories which contain some 
large files. It's always in the same place, in pmap.c, though it's not 
always the same "diff". (It's not always the same directories.) This 
does *not* happen with current.

I'm experimenting with selected pull-ups in the call chain. There are 
a lot of diffs in "pmap.c", "sys/uvm" and "sys/nfs" between current 
and netbsd-2-0, and I don't understand most of it, so any suggestions 
would be appreciated.


Frederick