tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Anomalies while handling p_nstopchild count



While investigating my problem with the zombie-that-wouldn't-die (see
thread on current-users), a couple of anomalies have appeared.

The first of these is reported by Robert Elz in his PR kern/50298 where
it appears that the wrong process's p_nstopchild count is being updated.

The second is just reported by me in PR kern/50300:

Shortly after setting the current (exitting) process's p_stat to SDEAD, sys_exit() may reparent the process because of its (current) parent is unwilling to wait() for its children to exit. Unfortunately,
proc_reparent() only adjusts the old and new parents' p_nstopchild
counts for SZOMB (or, in some cases, SSTOP) processes;  since our
process is still SDEAD, the parents don't get updated.

I'm just now studying and learning this part of the kernel, so I don't
want to commit the changes suggested in these two PRs without having
someone else review and approve.  Any volunteers?   :)

While continuing to track down the zombie-that-would-not-die I managed
to find two more places where a process's p_stat and its parent's count
of children to wait for (p_nstopchild) get out of sync.  The additional
issues are documented in PR kern/50308 and kern/50318.

With fixes for all four of these PRs in my local kernel, the zombie
problem seems to have disappeared, and no other ill effects have been
seen.  I have confirmed that at least kern/50300 was being seen in my
local system, and correlated with the appearance of the long-lived
zombie;  kern/50298 and kern/50308 have not been specifically observed.

And kern/50318 only occurs during the late stages of system shutdown,
and was detected only because of some debugging code that I had added
temporarily.

I have attached diffs to this Email with fixes for all four PRs.  I'm
planning on waiting a couple of weeks or so in case anyone wants to
discuss the problem or the fixes.  And, given the time of year, it may
be best to wait until early next month before trying to manipulate any
zombies.  :)


+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+
Index: kern_exec.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_exec.c,v
retrieving revision 1.418
diff -u -p -r1.418 kern_exec.c
--- kern_exec.c	2 Oct 2015 16:54:15 -0000	1.418
+++ kern_exec.c	10 Oct 2015 08:34:58 -0000
@@ -1282,7 +1282,7 @@ execve_runproc(struct lwp *l, struct exe
 
 		KERNEL_UNLOCK_ALL(l, &l->l_biglocks);
 		p->p_pptr->p_nstopchild++;
-		p->p_pptr->p_waited = 0;
+		p->p_waited = 0;
 		mutex_enter(p->p_lock);
 		ksiginfo_queue_init(&kq);
 		sigclearall(p, &contsigmask, &kq);
Index: kern_exit.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_exit.c,v
retrieving revision 1.245
diff -u -p -r1.245 kern_exit.c
--- kern_exit.c	2 Oct 2015 16:54:15 -0000	1.245
+++ kern_exit.c	10 Oct 2015 08:34:58 -0000
@@ -227,7 +227,15 @@ exit1(struct lwp *l, int rv)
 	if (__predict_false(p->p_sflag & PS_STOPEXIT)) {
 		KERNEL_UNLOCK_ALL(l, &l->l_biglocks);
 		sigclearall(p, &contsigmask, &kq);
+
+		if (!mutex_tryenter(proc_lock)) {
+			mutex_exit(p->p_lock);
+			mutex_enter(proc_lock);
+			mutex_enter(p->p_lock);
+		}
 		p->p_waited = 0;
+		p->p_pptr->p_nstopchild++;
+		mutex_exit(proc_lock);
 		membar_producer();
 		p->p_stat = SSTOP;
 		lwp_lock(l);
@@ -959,7 +967,7 @@ proc_reparent(struct proc *child, struct
 	if (child->p_pptr == parent)
 		return;
 
-	if (child->p_stat == SZOMB ||
+	if (child->p_stat == SZOMB || child->p_stat == SDEAD ||
 	    (child->p_stat == SSTOP && !child->p_waited)) {
 		child->p_pptr->p_nstopchild--;
 		parent->p_nstopchild++;
Index: kern_synch.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_synch.c,v
retrieving revision 1.308
diff -u -p -r1.308 kern_synch.c
--- kern_synch.c	28 Feb 2014 10:16:51 -0000	1.308
+++ kern_synch.c	10 Oct 2015 08:34:58 -0000
@@ -985,7 +985,11 @@ suspendsched(void)
 			continue;
 		}
 
-		p->p_stat = SSTOP;
+		if (p->p_stat != SSTOP) {
+			if (p->p_stat != SZOMB && p->p_stat != SDEAD)
+				p->p_pptr->p_nstopchild++;
+			p->p_stat = SSTOP;
+		}
 
 		LIST_FOREACH(l, &p->p_lwps, l_sibling) {
 			if (l == curlwp)


Home | Main Index | Thread Index | Old Index