Subject: Re: kern/25285: i386 MP panic: TLB IPI rendezvous failed (mask 1)
To: None <dokas@cs.umn.edu>
From: Erik E. Fair <fair@netbsd.org>
List: current-users
Date: 06/10/2004 10:46:34
We need to know:
1. What interrupts were masked off by each CPU while that CPU was
spinning while waiting for a lock? (offhand, one or more of them
would appear to have masked off the TLB IPI...)
2. What data structures were each of those CPUs attempting to
manipulate behind each lock request? They might be separable into
separate locks so that there isn't contention for the same big lock.
This looks like a deadlock situation because of an interaction
between interrupt masking and our mutex subsystem. At least one of
those spinning CPUs has masked off the TLB IPI and then attempts to
acquire the kernel biglock, and spins. Another CPU attempts a TLB
shootdown (probably while holding the kernel biglock), and fails
because of the other CPU waiting for the biglock while its TLB IPI is
masked off.
At least the system didn't silently hang.
It's also instructive that all the other CPUs other than the one
attempting the shootdown are waiting for the kernel lock. We need
finer grained locking than this to prevent this level of contention.
Erik <fair@netbsd.org>