Subject: kern/23095: uvm_swapout_threads swapout LWPs on other CPU
To: None <gnats-bugs@gnats.netbsd.org>
From: None <chris@pin.lu>
List: netbsd-bugs
Date: 10/08/2003 13:54:49
>Number:         23095
>Category:       kern
>Synopsis:       uvm_swapout_threads swapout LWPs on other CPU
>Confidential:   no
>Severity:       critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Oct 08 11:55:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Christian Limpach <chris@pin.lu>
>Release:        NetBSD current as of 2003/08/10
>Organization:
	
>Environment:
MP systems
>Description:

uvm_swapout_threads will swapout LWPs which are running on another CPU:
- uvm_swapout_threads considers LWPs running on another CPU for swapout
  if their l_swtime is high
- uvm_swapout_threads considers LWPs on the runqueue for swapout if their
  l_swtime is high but these LWPs might be running by the time uvm_swapout
  is called

>How-To-Repeat:
A kernel with added debugging output paniced like this:
uvm_swapout(96.1) LSONPROC and l->l_cpu != curcpu()
panic: kernel debugging assertion "l->l_flag & L_INMEM" failed: file
"../../../../arch/i386/i386/trap.c", line 549
Stopped in pid 96.1 (xmms) at   netbsd:cpu_Debugger+0x4:        leave
db{1}>

The added KDASSERT is in the i386 trap handler which uses l->l_addr->u_pcb.

>Fix:
- make uvm_swapout_threads not consider LWPs which are running on another
  CPU

- make uvm_swapout check the status of the LWP it's swapping out before
  doing so:
* This requires moving the cpu_swapout call after checking and updating
  the L_INMEM flag (since the check/update has to be done atomically wrt
  the scheduler lock).
* This in turn requires removing several L_INMEM asserts from code called
  from cpu_swapout.  It might be good to add the asserts to all other
  callers of the changed code.

Caveat:
- there would seem to be a race condition where uvm_swapin and uvm_swapout
  can run concurrently (on MP machines) and end up with the L_INMEM flag
  and the user struct wiring out of sync
- vax pmap_rmproc/rmspace probably needs the same change

Patch:

Index: sys/uvm/uvm_glue.c
===================================================================
RCS file: /cvs/netbsd/src/sys/uvm/uvm_glue.c,v
retrieving revision 1.66
diff -u -r1.66 uvm_glue.c
--- sys/uvm/uvm_glue.c	29 Jun 2003 22:32:50 -0000	1.66
+++ sys/uvm/uvm_glue.c	8 Oct 2003 11:03:28 -0000
@@ -610,8 +610,10 @@
 		if (!swappable(l))
 			continue;
 		switch (l->l_stat) {
-		case LSRUN:
 		case LSONPROC:
+			if (l->l_cpu != curcpu())
+				continue;
+		case LSRUN:
 			if (l->l_swtime > outpri2) {
 				outl2 = l;
 				outpri2 = l->l_swtime;
@@ -674,15 +676,13 @@
 #endif
 
 	/*
-	 * Do any machine-specific actions necessary before swapout.
-	 * This can include saving floating point state, etc.
-	 */
-	cpu_swapout(l);
-
-	/*
 	 * Mark it as (potentially) swapped out.
 	 */
 	SCHED_LOCK(s);
+	if (l->l_stat == LSONPROC && l->l_cpu != curcpu()) {
+		SCHED_UNLOCK(s);
+		return;
+	}
 	l->l_flag &= ~L_INMEM;
 	if (l->l_stat == LSRUN)
 		remrunqueue(l);
@@ -692,6 +692,12 @@
 	++uvmexp.swapouts;
 
 	/*
+	 * Do any machine-specific actions necessary before swapout.
+	 * This can include saving floating point state, etc.
+	 */
+	cpu_swapout(l);
+
+	/*
 	 * Unwire the to-be-swapped process's user struct and kernel stack.
 	 */
 	addr = (vaddr_t)l->l_addr;
Index: sys/arch/i386/isa/npx.c
===================================================================
RCS file: /cvs/netbsd/src/sys/arch/i386/isa/npx.c,v
retrieving revision 1.95
diff -u -r1.95 npx.c
--- sys/arch/i386/isa/npx.c	6 Sep 2003 23:15:35 -0000	1.95
+++ sys/arch/i386/isa/npx.c	8 Oct 2003 10:33:29 -0000
@@ -696,7 +696,6 @@
 	struct cpu_info *oci;
 
 	KDASSERT(l->l_addr != NULL);
-	KDASSERT(l->l_flag & L_INMEM);
 
 	oci = l->l_addr->u_pcb.pcb_fpcpu;
 	if (oci == NULL)
Index: sys/arch/alpha/alpha/machdep.c
===================================================================
RCS file: /cvs/netbsd/src/sys/arch/alpha/alpha/machdep.c,v
retrieving revision 1.274
diff -u -r1.274 machdep.c
--- sys/arch/alpha/alpha/machdep.c	7 Oct 2003 17:04:18 -0000	1.274
+++ sys/arch/alpha/alpha/machdep.c	8 Oct 2003 11:15:52 -0000
@@ -1835,7 +1835,6 @@
 #endif
 
 	KDASSERT(l->l_addr != NULL);
-	KDASSERT(l->l_flag & L_INMEM);
 
 	FPCPU_LOCK(&l->l_addr->u_pcb, s);
 
Index: sys/arch/amd64/amd64/fpu.c
===================================================================
RCS file: /cvs/netbsd/src/sys/arch/amd64/amd64/fpu.c,v
retrieving revision 1.5
diff -u -r1.5 fpu.c
--- sys/arch/amd64/amd64/fpu.c	6 Oct 2003 22:53:47 -0000	1.5
+++ sys/arch/amd64/amd64/fpu.c	8 Oct 2003 11:14:03 -0000
@@ -296,7 +296,6 @@
 	struct cpu_info *oci;
 
 	KDASSERT(l->l_addr != NULL);
-	KDASSERT(l->l_flag & L_INMEM);
 
 	oci = l->l_addr->u_pcb.pcb_fpcpu;
 	if (oci == NULL)
Index: sys/arch/pc532/pc532/vm_machdep.c
===================================================================
RCS file: /cvs/netbsd/src/sys/arch/pc532/pc532/vm_machdep.c,v
retrieving revision 1.57
diff -u -r1.57 vm_machdep.c
--- sys/arch/pc532/pc532/vm_machdep.c	7 Aug 2003 16:29:03 -0000	1.57
+++ sys/arch/pc532/pc532/vm_machdep.c	8 Oct 2003 11:18:04 -0000
@@ -195,7 +195,7 @@
 /*
  * cpu_swapout is called immediately before a process's 'struct user'
  * and kernel stack are unwired (which are in turn done immediately
- * before it's P_INMEM flag is cleared).  If the process is the
+ * after its P_INMEM flag is cleared).  If the process is the
  * current owner of the floating point unit, the FP state has to be
  * saved, so that it goes out with the pcb, which is in the user area.
  */
>Release-Note:
>Audit-Trail:
>Unformatted: