NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/50300: Issue with updating p_nstopchild count
The following reply was made to PR kern/50300; it has been noted by GNATS.
From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: kern/50300: Issue with updating p_nstopchild count
Date: Tue, 06 Oct 2015 06:02:32 +0700
Date: Mon, 5 Oct 2015 02:25:00 +0000 (UTC)
From: paul%whooppee.com@localhost
Message-ID: <20151005022500.9D035A5B2E%mollari.NetBSD.org@localhost>
A better fix might be ...
| - if (child->p_stat == SZOMB ||
| + if (P_ZOMBIE(child) ||
| (child->p_stat == SSTOP && !child->p_waited)) {
| child->p_pptr->p_nstopchild--;
| parent->p_nstopchild++;
The case Paul found, where a DEAD process is reparented, is most likely
not the cause of the problem he's seeing.
But a process in SDYING state, which gets reparented, might very easily
be the problem - that's a much more likely scenario for his workload.
Such a process won't reparent itself (unlike the SDEAD case), but if its
parent dies, it will be reparented to init.
Because the code above ignores SDYING processes, just as it did SDEAD ones,
the effect will be that init will gain dying (and eventually, zombie)
children without having its p_nstopchild count incremented. Then, after it
has waited a few times, p_nstopchild will drop to 0, and any future
wait() calls will fail (that is, just hang, ignoring other processes that
should be located).
Given a sufficiently busy workload, like the one Paul uses to trigger the
problem, it is entirely possible that processes will sometimes exit while
having children that are temporarily stalled in SDYING state.
kre
Home |
Main Index |
Thread Index |
Old Index