NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/50300: Issue with updating p_nstopchild count



The following reply was made to PR kern/50300; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/50300: Issue with updating p_nstopchild count
Date: Tue, 06 Oct 2015 06:02:32 +0700

     Date:        Mon,  5 Oct 2015 02:25:00 +0000 (UTC)
     From:        paul%whooppee.com@localhost
     Message-ID:  <20151005022500.9D035A5B2E%mollari.NetBSD.org@localhost>
 
 A better fix might be ...
 
   | -	if (child->p_stat == SZOMB ||
   | +	if (P_ZOMBIE(child) ||
   |  	    (child->p_stat == SSTOP && !child->p_waited)) {
   |  		child->p_pptr->p_nstopchild--;
   |  		parent->p_nstopchild++;
 
 The case Paul found, where a DEAD process is reparented, is most likely
 not the cause of the problem he's seeing.
 
 But a process in SDYING state, which gets reparented, might very easily
 be the problem - that's a much more likely scenario for his workload.
 
 Such a process won't reparent itself (unlike the SDEAD case), but if its
 parent dies, it will be reparented to init.
 
 Because the code above ignores SDYING processes, just as it did SDEAD ones,
 the effect will be that init will gain dying (and eventually, zombie)
 children without having its p_nstopchild count incremented.  Then, after it
 has waited a few times, p_nstopchild will drop to 0, and any future
 wait() calls will fail (that is, just hang, ignoring other processes that
 should be located).
 
 Given a sufficiently busy workload, like the one Paul uses to trigger the
 problem, it is entirely possible that processes will sometimes exit while
 having children that are temporarily stalled in SDYING state.
 
 kre
 


Home | Main Index | Thread Index | Old Index