Subject: Re: Continued problems with SMP on NetBSD/alpha
To: Tonnerre LOMBARD <tonnerre@thebsh.sygroup.ch>
From: Michael L. Hitch <mhitch@lightning.msu.montana.edu>
List: port-alpha
Date: 12/20/2006 11:17:11
On Tue, 19 Dec 2006, Tonnerre LOMBARD wrote:

>> I.e. everything comes to a grinding halt from all external appearances
>> and the only way you've been able to get it going again is to push the
>> reset button and reboot (or the halt button and get to the SRM prompt)?
>
> Yes. A complete hang, and no ddb is triggered, even though it is set to
> trigger on panic. (Which suggests that no panic is taking place.)
>
>> If you have DDB in your kernel ("options DDB"), and you have the sysctl
>> ddb.onpanic and ddb.fromconsole settings both turned on (i.e. equal to
>> one) (perhaps by default with "options DDB_ONPANIC=1"), then the
>> question would be whether or not you can force the kernel into the
>> debugger (send a BREAK signal on a serial console
>
> I'm going to try to reproduce this - but I need a non-production SMP
> alpha first. I'll try to get one to work, but this won't be before January.

   I had some problems with my CS20 getting into a deadlock with interrupts
disabled, so I couldn't break into DDB from the console, and the only 
recourse I had was to cycle power.  Later, I finally located the halt 
switch, and using that was able to halt ot the SRM, which also displayed 
the program counter of one of the CPUs.  From the SRM, I could use the 
'continue' command to get back into the kernel and DDB and poke around 
from there.  I was finally able to determine where it was deadlocked (I 
think I had to eventually enable LOCKDEBUG to get more information) and 
was able to come up with a fix.  After that deadlock got fixed, and the 
problem with the FP IPP sutff was fixed, both my CS20 systems have been 
happily running MP.  One has been running 3.0_STABLE for over 307 days.
The other only has 5 days of uptime because I upgraded the disk in it a 
few weeks back, and have been testing more recent kernels on it (it's 
currently running a 4.0_BETA kernel from the original 4.0 branch - I 
haven't gotten around to updating it to the latest 4.0 branch yet).

--
Michael L. Hitch			mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University	Bozeman, MT	USA