NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-alpha/38335 (kernel freeze on alpha MP system)



The following reply was made to PR port-alpha/38335; it has been noted by GNATS.

From: "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost>
To: Jarle Greipsland <jarle%uninett.no@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, gnats-admin%netbsd.org@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Date: Sat, 3 Oct 2009 10:25:43 -0600 (MDT)

 On Fri, 2 Oct 2009, Jarle Greipsland wrote:
 
 > "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost> writes:
 >>    It does show that what I thought had happened did indeed happen.
 >>> 0xfffffc0000736ac0 <pmap_do_tlb_shootdown+224>: ldq     t1,24(t3)
 > [ ... ]
 > Is there any more info I can gather for you, or can you just as
 > easily reproduce this yourself?
 
    I'm quite certain in this particular case the job queue entry is linked
 to itself resulting in a hung cpu (which then hangs other cpus because it 
 has the shootdown queue locked for that cpu).
 
    I haven't been able to easily reproduce this.  I can complete full 
 builds fairly often, although a lot of the time I will get a segment fault 
 in one of the tools (typically grotty or install).
 
    I also got another deadlock situation yesterday:  cpu 0 had acquired the 
 lock for the shootdown queue (presumably for a different cpu - it should 
 be skipping the current cpu) and got interrupted by a shootdown IPI.  The 
 IPI routine was trying to acquire the lock for the current cpu's queue, 
 which was currently locked (can't tell what held the lock though).
 
    One thing I haven't tried yet is a LOCKDEBUG kernel, which should do 
 some additional checking on locking, and should be able to provide 
 information on what holds the lock.  This last time I tried LOCKDEBUG, I 
 ran into problems and quickly got lost in the locking morass.
 
 >>    This is the problem I'm still in the process of trying to figure out
 >> what the problem is and how to fix it.  The patch I posted previously is a
 >> workaround to detect this particular problem, and will display a message
 >> if it occurs.
 > It _does_ occur.  I applied the patch, and a 'build.sh -j4'
 > triggered the panic after a while.
 
    What was the panic you got?
 
 
 --
 Michael L. Hitch                       mhitch%montana.edu@localhost
 Computer Consultant
 Information Technology Center
 Montana State University       Bozeman, MT     USA
 


Home | Main Index | Thread Index | Old Index