Re: port-alpha/38335 (kernel freeze on alpha MP system)

To: port-alpha-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,jarle%uninett.no@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
From: "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost>
Date: Thu, 18 Feb 2010 22:20:05 +0000 (UTC)

The following reply was made to PR port-alpha/38335; it has been noted by GNATS.

From: "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost>
To: Jarle Greipsland <jarle%uninett.no@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Date: Thu, 18 Feb 2010 15:04:28 -0700 (MST)

 On Fri, 5 Feb 2010, Jarle Greipsland wrote:
 
 >> Did the committed fix take care of the issue you're seeing?
 > I don't think so.  I have been running with a LOCKDEBUG-kernel,
 > and yesterday, while trying to do a 'build.sh release', it
 > panicked.
 >
 > login: Mutex error: lockdebug_wantlock: locking against myself
 >
 > lock address : 0xfffffc0000882f68 type     :               spin
 > initialized  : 0xfffffc00006735d8
 > shared holds :                  0 exclusive:                  1
 > shares wanted:                  0 exclusive:                  1
 > current cpu  :                  0 last held:                  0
 > current l  : 0xfffffc003fab2400 last held: 0xfffffc003fab2400
 > last locked  : 0xfffffc0000674690 unlocked : 0xfffffc0000672478
 > owner field  : 0x0000000000000400 wait/sn:                0/1
 >
 > panic: LOCKDEBUG
 > Stopped in pid 0.35 (system) at netbsd:cpu_Debugger+0x4:        ret     
 > zero,(ra
 > )
 > db{0}>
 > db{0}> trace
 > cpu_Debugger() at netbsd:cpu_Debugger+0x4
 > panic() at netbsd:panic+0x278
 > lockdebug_abort1() at netbsd:lockdebug_abort1+0x150
 > mutex_enter() at netbsd:mutex_enter+0x38c
 > pool_get() at netbsd:pool_get+0x58
 > pool_cache_put_slow() at netbsd:pool_cache_put_slow+0x2b4
 > pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0x1e0
 > pmap_do_tlb_shootdown() at netbsd:pmap_do_tlb_shootdown+0x178
 > alpha_ipi_process() at netbsd:alpha_ipi_process+0xb8
 > interrupt() at netbsd:interrupt+0x88
 > XentInt() at netbsd:XentInt+0x1c
 > --- interrupt (from ipl 4) ---
 > lockdebug_mem_check() at netbsd:lockdebug_mem_check+0x1b4
 > pool_put() at netbsd:pool_put+0x7c
 > pool_cache_invalidate_groups() at netbsd:pool_cache_invalidate_groups+0xf4
 > pool_cache_invalidate() at netbsd:pool_cache_invalidate+0x8c
 > pool_reclaim() at netbsd:pool_reclaim+0x68
 > pool_drain_end() at netbsd:pool_drain_end+0x60
 > uvm_pageout() at netbsd:uvm_pageout+0x880
 > exception_return() at netbsd:exception_return
 
    This looks like the tlb shootdown IPI interrupted a pool_put() 
 operation, which likley holds a mutex related to some pool.  The
 tlb shootdown is attempting to return an entry to the shootdown
 job pool, and tries to acquire the same lock.
 
    I hadn't seen this type of deadlock (which it would have without
 LOCKDEBUG) before, but I recently started seeing seeing several
 deadlocks just grepping a number of large files.  Trying to build
 a current LOCKDEBUG kernel resulted in a hard hang when booting the 
 kernel, requiring a power cycle of my CS20 and I had trouble recovering 
 after that.  I switched to a 4 CPU ES40 to debug the LOCKDEBUG problem, 
 and eventually found what that was and using a workaround was finally able 
 to get a working LOCKDEBUG kernel.  My grepping was able to cause 
 deadlocks fairly easily, and I had been trying to to keep the CS20 from 
 hanging in the deadlock by enabling the SPINLOCK_SPINOUT part of 
 LOCKDEBUG.  Some of the deadlocks looked very similar to the above.
 
    Try the following change, which will use IPL_SCHED for the pool cache 
 used by the tlb shootdown code.  I've done this on my CS20, and have been 
 able to run numereous greps that would previously deadlock.  In the above 
 traceback, I can't quite see how that would work, unless the 
 pool_drain_end() is going to be locking the tlb shootdown job pool.
 
 Index: sys/arch/alpha/alpha/pmap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/alpha/alpha/pmap.c,v
 retrieving revision 1.252
 diff -u -p -r1.252 pmap.c
 --- sys/arch/alpha/alpha/pmap.c 26 Nov 2009 00:19:11 -0000      1.252
 +++ sys/arch/alpha/alpha/pmap.c 18 Feb 2010 21:52:16 -0000
 @@ -937,7 +937,7 @@ pmap_bootstrap(paddr_t ptaddr, u_int max
           */
          pool_cache_bootstrap(&pmap_tlb_shootdown_job_cache,
              sizeof(struct pmap_tlb_shootdown_job), CACHE_LINE_SIZE,
 -            0, PR_LARGECACHE, "pmaptlb", NULL, IPL_VM, NULL, NULL, NULL);
 +            0, PR_LARGECACHE, "pmaptlb", NULL, IPL_SCHED, NULL, NULL, NULL);
          for (i = 0; i < ALPHA_MAXPROCS; i++) {
                  TAILQ_INIT(&pmap_tlb_shootdown_q[i].pq_head);
                  mutex_init(&pmap_tlb_shootdown_q[i].pq_lock, MUTEX_DEFAULT,
 
 
 --
 Michael L. Hitch                       mhitch%montana.edu@localhost
 Computer Consultant
 Information Technology Center
 Montana State University       Bozeman, MT     USA

Prev by Date: bin/42844: esiop(4)/siop(4) can lose cmd entries under resource shortage conditions
Next by Date: Re: kern/42799: LVM logical volumes fail to attach if 'pseudo-device dm' is compiled in th kernel
Previous by Thread: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Next by Thread: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Indexes:

Home | Main Index | Thread Index | Old Index