NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-alpha/38335 (kernel freeze on alpha MP system)



The following reply was made to PR port-alpha/38335; it has been noted by GNATS.

From: "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: jarle%uninett.no@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Date: Fri, 19 Feb 2010 11:10:17 -0700 (MST)

 On Thu, 18 Feb 2010, Michael L. Hitch wrote:
 
 > > db{0}> trace
 > > cpu_Debugger() at netbsd:cpu_Debugger+0x4
 > > panic() at netbsd:panic+0x278
 > > lockdebug_abort1() at netbsd:lockdebug_abort1+0x150
 > > mutex_enter() at netbsd:mutex_enter+0x38c
 > > pool_get() at netbsd:pool_get+0x58
 > > pool_cache_put_slow() at netbsd:pool_cache_put_slow+0x2b4
 > > pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0x1e0
 > > pmap_do_tlb_shootdown() at netbsd:pmap_do_tlb_shootdown+0x178
 > > alpha_ipi_process() at netbsd:alpha_ipi_process+0xb8
 > > interrupt() at netbsd:interrupt+0x88
 > > XentInt() at netbsd:XentInt+0x1c
 > > --- interrupt (from ipl 4) ---
 > > lockdebug_mem_check() at netbsd:lockdebug_mem_check+0x1b4
 > > pool_put() at netbsd:pool_put+0x7c
 > > pool_cache_invalidate_groups() at netbsd:pool_cache_invalidate_groups+0xf4
 > > pool_cache_invalidate() at netbsd:pool_cache_invalidate+0x8c
 > > pool_reclaim() at netbsd:pool_reclaim+0x68
 > > pool_drain_end() at netbsd:pool_drain_end+0x60
 > > uvm_pageout() at netbsd:uvm_pageout+0x880
 > > exception_return() at netbsd:exception_return
 ...
 >    Try the following change, which will use IPL_SCHED for the pool cache
 > used by the tlb shootdown code.  I've done this on my CS20, and have been
 > able to run numereous greps that would previously deadlock.  In the above
 > traceback, I can't quite see how that would work, unless the
 > pool_drain_end() is going to be locking the tlb shootdown job pool.
 
    The patch I suggested is not going to help.  I just ran one more grep 
 today and got a panic just like the above one.
 
    The problem is that the lock being held is for the pcg_large_pool, which 
 uses IPL_VM, which can be interrupted by the tlb shootdown IPI.  In this 
 case, the lock has been taken by the cpu when doing the pool cache 
 invalidate, and gets interrupted by the IPI on the same cpu.  It needs to 
 do a pool_get() when the tlb shootdown routine is trying to put a pool 
 cache entry back into the cache, and that needs to also lock 
 pcg_large_pool.
 
    I don't really know all that much about the pool cache or pools, and 
 don't see any easy solution to this.
 
    A quick and dirty workaround would be to change the maximum number of 
 'jobs' the tlb shutdown does to 0, which effectively forces it to always 
 invalid the entire TLB and never use any pool cache entries.
 
    An alternative that I looked at quite some time ago (2 years maybe?) was 
 to replace the pool cache usage with statically allocated entries in the 
 pmap_tlb_shootdown_q[].
 
 --
 Michael L. Hitch                       mhitch%montana.edu@localhost
 Computer Consultant
 Information Technology Center
 Montana State University       Bozeman, MT     USA
 


Home | Main Index | Thread Index | Old Index