Subject: port-alpha/26159: Paused CPUs fail to halt when DDB reboot command executed
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <mhitch@NetBSD.org>
List: netbsd-bugs
Date: 07/03/2004 22:27:33
>Number:         26159
>Category:       port-alpha
>Synopsis:       Paused CPUs fail to halt when DDB reboot command executed
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    port-alpha-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jul 04 04:44:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Michael L. Hitch
>Release:        NetBSD 2.0F
>Organization:
	
>Environment:
	
	
System: NetBSD gemini2.msu.montana.edu 2.0F NetBSD 2.0F (GENERIC.MP) #35: Wed Jun 30 20:14:00 MDT 2004 mhitch@netbsd0:/usr/NetBSD-current/OBJ/alphaev56/sys/arch/alpha/compile.alphaev56/GENERIC.MP alpha
Architecture: alpha
Machine: alpha
>Description:
	When running an SMP kernel, executing a reboot (or synch) command from DDB
	will be unable to halt any paused CPUs.  If the primary CPU is one of the
	ones paused, the machine will hang, since the primary CPU is the one that
	will halt back to SRM.  Attempts to halt any secondary CPUs which are
	paused will timeout, leaving the CPUs running when SRM is entered.

	When one CPU enters DDB, that CPU will attempt to pause all the other
	CPUs.  When the other CPUs pause, they set a flag bit in cpus_paused
	and spin at splhigh() until their cpus_paused bit is cleared.  When DDB
	returns, it will resume the paused CPUs.  Execution of the reboot (or
	synch) command does not return control back to the kernel, but directly
	calls kernel functions including cpu_reboot() without resuming the
	other CPUs.  Since they are looping at splhigh(), they will not get the
	halt IPI and will continue looping.
>How-To-Repeat:
	Run an MP kernel on a multi-cpu system and attempt to reboot it from
	the DDB prompt (after a panic or other entry into DDB).  Note that
	frequently it will be unable to halt a secondary CPU, or the secondary
	cpu will halt, but the system will hang at tha point.
>Fix:
	After sending the halt IPI to the other CPUS, clear all the bits in
	cpus_paused to allow the other CPUs to continue and process the halt
	IPI.

Index: sys/arch/alpha/alpha/machdep.c
===================================================================
RCS file: /cvsroot/src/sys/arch/alpha/alpha/machdep.c,v
retrieving revision 1.282
diff -u -r1.282 machdep.c
--- sys/arch/alpha/alpha/machdep.c	24 Mar 2004 15:34:46 -0000	1.282
+++ sys/arch/alpha/alpha/machdep.c	4 Jul 2004 04:27:19 -0000
@@ -1023,6 +1023,9 @@
 	 */
 	alpha_broadcast_ipi(ALPHA_IPI_HALT);
 
+	/* Ensure any CPUs paused by DDB resume execute so they can halt */
+	cpus_paused = 0;
+
 	for (i = 0; i < 10000; i++) {
 		alpha_mb();
 		if (cpus_running == wait_mask)
>Release-Note:
>Audit-Trail:
>Unformatted: