NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-alpha/38335 (kernel freeze on alpha MP system)
The following reply was made to PR port-alpha/38335; it has been noted by GNATS.
From: "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: jarle%uninett.no@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Date: Fri, 19 Feb 2010 11:10:17 -0700 (MST)
On Thu, 18 Feb 2010, Michael L. Hitch wrote:
> > db{0}> trace
> > cpu_Debugger() at netbsd:cpu_Debugger+0x4
> > panic() at netbsd:panic+0x278
> > lockdebug_abort1() at netbsd:lockdebug_abort1+0x150
> > mutex_enter() at netbsd:mutex_enter+0x38c
> > pool_get() at netbsd:pool_get+0x58
> > pool_cache_put_slow() at netbsd:pool_cache_put_slow+0x2b4
> > pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0x1e0
> > pmap_do_tlb_shootdown() at netbsd:pmap_do_tlb_shootdown+0x178
> > alpha_ipi_process() at netbsd:alpha_ipi_process+0xb8
> > interrupt() at netbsd:interrupt+0x88
> > XentInt() at netbsd:XentInt+0x1c
> > --- interrupt (from ipl 4) ---
> > lockdebug_mem_check() at netbsd:lockdebug_mem_check+0x1b4
> > pool_put() at netbsd:pool_put+0x7c
> > pool_cache_invalidate_groups() at netbsd:pool_cache_invalidate_groups+0xf4
> > pool_cache_invalidate() at netbsd:pool_cache_invalidate+0x8c
> > pool_reclaim() at netbsd:pool_reclaim+0x68
> > pool_drain_end() at netbsd:pool_drain_end+0x60
> > uvm_pageout() at netbsd:uvm_pageout+0x880
> > exception_return() at netbsd:exception_return
...
> Try the following change, which will use IPL_SCHED for the pool cache
> used by the tlb shootdown code. I've done this on my CS20, and have been
> able to run numereous greps that would previously deadlock. In the above
> traceback, I can't quite see how that would work, unless the
> pool_drain_end() is going to be locking the tlb shootdown job pool.
The patch I suggested is not going to help. I just ran one more grep
today and got a panic just like the above one.
The problem is that the lock being held is for the pcg_large_pool, which
uses IPL_VM, which can be interrupted by the tlb shootdown IPI. In this
case, the lock has been taken by the cpu when doing the pool cache
invalidate, and gets interrupted by the IPI on the same cpu. It needs to
do a pool_get() when the tlb shootdown routine is trying to put a pool
cache entry back into the cache, and that needs to also lock
pcg_large_pool.
I don't really know all that much about the pool cache or pools, and
don't see any easy solution to this.
A quick and dirty workaround would be to change the maximum number of
'jobs' the tlb shutdown does to 0, which effectively forces it to always
invalid the entire TLB and never use any pool cache entries.
An alternative that I looked at quite some time ago (2 years maybe?) was
to replace the pool cache usage with statically allocated entries in the
pmap_tlb_shootdown_q[].
--
Michael L. Hitch mhitch%montana.edu@localhost
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
Home |
Main Index |
Thread Index |
Old Index