NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-alpha/38335 (kernel freeze on alpha MP system)
The following reply was made to PR port-alpha/38335; it has been noted by GNATS.
From: "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost>
To: Jarle Greipsland <jarle%uninett.no@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Date: Thu, 18 Feb 2010 15:04:28 -0700 (MST)
On Fri, 5 Feb 2010, Jarle Greipsland wrote:
>> Did the committed fix take care of the issue you're seeing?
> I don't think so. I have been running with a LOCKDEBUG-kernel,
> and yesterday, while trying to do a 'build.sh release', it
> panicked.
>
> login: Mutex error: lockdebug_wantlock: locking against myself
>
> lock address : 0xfffffc0000882f68 type : spin
> initialized : 0xfffffc00006735d8
> shared holds : 0 exclusive: 1
> shares wanted: 0 exclusive: 1
> current cpu : 0 last held: 0
> current l : 0xfffffc003fab2400 last held: 0xfffffc003fab2400
> last locked : 0xfffffc0000674690 unlocked : 0xfffffc0000672478
> owner field : 0x0000000000000400 wait/sn: 0/1
>
> panic: LOCKDEBUG
> Stopped in pid 0.35 (system) at netbsd:cpu_Debugger+0x4: ret
> zero,(ra
> )
> db{0}>
> db{0}> trace
> cpu_Debugger() at netbsd:cpu_Debugger+0x4
> panic() at netbsd:panic+0x278
> lockdebug_abort1() at netbsd:lockdebug_abort1+0x150
> mutex_enter() at netbsd:mutex_enter+0x38c
> pool_get() at netbsd:pool_get+0x58
> pool_cache_put_slow() at netbsd:pool_cache_put_slow+0x2b4
> pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0x1e0
> pmap_do_tlb_shootdown() at netbsd:pmap_do_tlb_shootdown+0x178
> alpha_ipi_process() at netbsd:alpha_ipi_process+0xb8
> interrupt() at netbsd:interrupt+0x88
> XentInt() at netbsd:XentInt+0x1c
> --- interrupt (from ipl 4) ---
> lockdebug_mem_check() at netbsd:lockdebug_mem_check+0x1b4
> pool_put() at netbsd:pool_put+0x7c
> pool_cache_invalidate_groups() at netbsd:pool_cache_invalidate_groups+0xf4
> pool_cache_invalidate() at netbsd:pool_cache_invalidate+0x8c
> pool_reclaim() at netbsd:pool_reclaim+0x68
> pool_drain_end() at netbsd:pool_drain_end+0x60
> uvm_pageout() at netbsd:uvm_pageout+0x880
> exception_return() at netbsd:exception_return
This looks like the tlb shootdown IPI interrupted a pool_put()
operation, which likley holds a mutex related to some pool. The
tlb shootdown is attempting to return an entry to the shootdown
job pool, and tries to acquire the same lock.
I hadn't seen this type of deadlock (which it would have without
LOCKDEBUG) before, but I recently started seeing seeing several
deadlocks just grepping a number of large files. Trying to build
a current LOCKDEBUG kernel resulted in a hard hang when booting the
kernel, requiring a power cycle of my CS20 and I had trouble recovering
after that. I switched to a 4 CPU ES40 to debug the LOCKDEBUG problem,
and eventually found what that was and using a workaround was finally able
to get a working LOCKDEBUG kernel. My grepping was able to cause
deadlocks fairly easily, and I had been trying to to keep the CS20 from
hanging in the deadlock by enabling the SPINLOCK_SPINOUT part of
LOCKDEBUG. Some of the deadlocks looked very similar to the above.
Try the following change, which will use IPL_SCHED for the pool cache
used by the tlb shootdown code. I've done this on my CS20, and have been
able to run numereous greps that would previously deadlock. In the above
traceback, I can't quite see how that would work, unless the
pool_drain_end() is going to be locking the tlb shootdown job pool.
Index: sys/arch/alpha/alpha/pmap.c
===================================================================
RCS file: /cvsroot/src/sys/arch/alpha/alpha/pmap.c,v
retrieving revision 1.252
diff -u -p -r1.252 pmap.c
--- sys/arch/alpha/alpha/pmap.c 26 Nov 2009 00:19:11 -0000 1.252
+++ sys/arch/alpha/alpha/pmap.c 18 Feb 2010 21:52:16 -0000
@@ -937,7 +937,7 @@ pmap_bootstrap(paddr_t ptaddr, u_int max
*/
pool_cache_bootstrap(&pmap_tlb_shootdown_job_cache,
sizeof(struct pmap_tlb_shootdown_job), CACHE_LINE_SIZE,
- 0, PR_LARGECACHE, "pmaptlb", NULL, IPL_VM, NULL, NULL, NULL);
+ 0, PR_LARGECACHE, "pmaptlb", NULL, IPL_SCHED, NULL, NULL, NULL);
for (i = 0; i < ALPHA_MAXPROCS; i++) {
TAILQ_INIT(&pmap_tlb_shootdown_q[i].pq_head);
mutex_init(&pmap_tlb_shootdown_q[i].pq_lock, MUTEX_DEFAULT,
--
Michael L. Hitch mhitch%montana.edu@localhost
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
Home |
Main Index |
Thread Index |
Old Index