tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Anomalies while handling p_nstopchild count



    Date:        Sat, 10 Oct 2015 14:35:16 +0000
    From:        Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
    Message-ID:  <20151010143405.D0991609D4%jupiter.mumble.net@localhost>

  | Based on the analysis I just sent to one of PR 50318 (not noticing
  | until I was done that it applied to all four of them),

Your analysis largely duplicates what Paul and I worked out while
hunting for his bug.

There is however one other case that warrants some examination, as
weird things kind of happen (though we believe/hope without problems).

This in spawn_return() [kern_exec.c]

There the status of the process is set to SSTOP without incrementing
the parent's p_nstopchild, violating the definition of that field.
But then it is (fairly soon after) set back again .. if no error occurred.

If there was an error, spawn_return() calls exit1(), which (eventually)
sets the process state to SDEAD and increments the parent's p_nstopchild
(regardless of the state of the process when exit1() was called).
(Before SDEAD the state is set to SDYING, where p_nstopchild should not
count it.)

Hence if p_nstopchild had been incremented in spawn_return() and not
decremented again (as the state is left at SSTOP in the error case),
then exit1() would cause the process to be counted twice.

At the minute we don't understand what prevents the parent from performing
a wait() while the child is in the SSTOP state inside spawn_return() but we
are assuming there must be something, as actually having the parent notified
of a stopped child at that point (it would - all by itself - run again before
the parent could do much with it I suspect) would be kind of absurd.  So I
(at least) believe there must be something that handles this.

In that case, failing to increment p_nstopchild for this SSTOP seems like
a safe enough thing to do.

Also, personally, while undoubtably technically correct, I'm not sure the
bug from PR 50318 (the case occurring during system shutdown) is worth
fixing.  That case is setting *every* process to SSTOP.   In that state,
nothing is ever running again, nothing is ever waiting for children again,
and the status of p_nstopchild (in any process) is really irrelevant.
The system is shutting down, and very shortly, there will be no processes
at all.   [Nb: in this code, even zombies are changed to SSTOP state,
which is a transition nothing else would ever expect to happen - the process
has already released all its resources - that is probably a worse technical
bug than failing to increment p_nstopchild.]

kre



Home | Main Index | Thread Index | Old Index