NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/50318: Process p_nstopchild counter not updated in suspendsched()

The following reply was made to PR kern/50318; it has been noted by GNATS.

From: Taylor R Campbell <>
Subject: Re: kern/50318: Process p_nstopchild counter not updated in suspendsched()
Date: Sat, 10 Oct 2015 14:24:53 +0000

 The analysis sounds plausible to me, but it also seems as though we
 ought to
 (a) audit all accounting of p_nstopchild, and
 (b) at least identify in comments, if not formally name, every state
 transition that affects it.
 A cursory examination of the grep of p_nstopchild suggests it counts
 - SSTOP children that have not been waited, and
 - SDEAD/SZOMB children.
 So there are a few state transitions that we must worry about:
 1. Child changes p_stat to {SSTOP, SDEAD, SZOMB}.
 2. Child changes p_stat from {SSTOP, SDEAD, SZOMB}.
 3. Child is waited while stopped.
 4. Child is transferred from one parent to another.
 The following updates of p_nstopchild currently exist:
 kern_exec.c:1289:               p->p_pptr->p_nstopchild++;
    Transition 1, implements proc.12345.stopexec to stop on exec (*),
    subject of PR kern/50289 concerning p_waited.
 kern_exit.c:496:        old_parent->p_nstopchild++;
    Transition 1, when process exits and goes to SDEAD before SZOMB.
    XXX Earlier in exit1, there is logic for proc.12345.stopexit to
    stop on exit (*) which transitions p to SSTOP, but it doesn't
    adjust p_pptr->p_nstopchild.
 kern_exit.c:786:                                        parent->p_nstopchil=
    Transition 3, implements the part of wait that picks a child.
 kern_exit.c:886:        parent->p_nstopchild--;
    Transition 2, when zombie process is freed and disassociated from
    its parent.
 kern_exit.c:964:                child->p_pptr->p_nstopchild--;
 kern_exit.c:965:                parent->p_nstopchild++;
    Transition 4, in proc_reparent.
 kern_fork.c:564:                p1->p_nstopchild++;
    Transition 1, implements proc.12345.stopfork to stop on fork (*).
 kern_lwp.c:514:         p->p_pptr->p_nstopchild--;
    Transition 2, restarts stopped lwp.  Not clear to me what the
    original p_stat is, but it is set to SACTIVE, so none of {SSTOP,
 kern_sig.c:1373:                                p->p_pptr->p_nstopchild--;
    Transition 2, restarts stopped process on SIGCONT.
 kern_sig.c:1474:        p->p_pptr->p_nstopchild++;
    Transition 1, stops process on SIGSTOP.
 kern_sig.c:2192:                p->p_pptr->p_nstopchild--;
    Transition 2, restarts stopped process using ptrace.
 (*) It looks like there is some duplicated logic to stop in various
 circumstances for proc.12345.stop{exec,exit,fork}.  The differences
 are worth scrutinizing further, but are not likely to be relevant to
 your unreapable zombie problem.
 The following updates of p_stat currently exist:
 kern_exec.c:1296:               p->p_stat =3D SSTOP;
    Transition 1, proc.12345.stopexec (PR kern/50298), does
 kern_exec.c:2073:               l->l_proc->p_stat =3D SSTOP;
 kern_exec.c:2144:               l->l_proc->p_stat =3D ostat;
    posix_spawn, no obvious update of l->l_proc->p_pptr->p_nstopchild,
    which looks wrong to me, but it is restored shortly afterward.
    This looks like a sketchy bookkeeping kludge, not actually stopping
    a process.
 kern_exec.c:2552:       p2->p_stat =3D SACTIVE;
    posix_spawn, new process (SIDL), can't be transition 2.
 kern_exit.c:232:                p->p_stat =3D SSTOP;
    Transition 1, proc.12345.stopexit, XXX looks like it's missing
    p->p_pptr->p_nstopchild++ (PR kern/50308).
 kern_exit.c:250:        p->p_stat =3D SDYING;
 kern_exit.c:492:        p->p_stat =3D SDEAD;
 kern_exit.c:556:        p->p_stat =3D SZOMB;
    Transition 1, process exit, does p->p_pptr->p_nstopchild++
    alongside the p->p_stat =3D SDEAD transition.
    XXX I assume it must not be stopped beforehand, so that it is not
    relevant whether p was waited, but I have not proven this
    XXX It does not appear that p->p_stat =3D SDEAD happens with
    p->p_lock held, which looks wrong.  What is the lock order for
    processes p_lock of different processes?  Ancestor first or
    descendant first?
 kern_exit.c:884:        p->p_stat =3D SIDL;               /* not even a zom=
 bie any more */
    Transition 2, proc_free, does p->p_pptr->p_nstopchild--.  Since it
    is always a transition from SZOMB, no need to check whether p is
 kern_fork.c:562:                p2->p_stat =3D SSTOP;
    Transition 1, proc.12345.stopfork, does p2->p_pptr->p_nstopchild++.
    Since it is always a new process (SIDL), no need to check whether
    p2 is waited.
 kern_fork.c:570:                p2->p_stat =3D SACTIVE;
    In fork, new process (SIDL), can't be transition 2.
 kern_lwp.c:510: p->p_stat =3D SACTIVE;
    Transition 2, restart stopped lwp, does p->p_pptr->p_nstopchild--
    if p is not waited.
 kern_proc.c:739:        p->p_stat =3D SIDL;                       /* protec=
 t against others */
    Newly allocated process, can't be transition 2.
 kern_sig.c:1370:                        p->p_stat =3D SACTIVE;
    Transition 2, SIGCONT/SIGKILL to continue a stopped process, does
    p->p_pptr->p_nstopchild-- if p is not waited.
 kern_sig.c:1472:        p->p_stat =3D SSTOP;
    Transition 1, stops process on SIGSTOP, does
 kern_sig.c:2187:        p->p_stat =3D SACTIVE;
    Transition 2, restarts stopped process using ptrace, does
    p->p_pptr->p_nstopchild-- if p is not waited.
 kern_synch.c:988:               p->p_stat =3D SSTOP;
    Transition 1, finally the case you noted here.
 From the above analysis, your patch looks correct to me.  But I think
 we nevertheless ought to write down formal names for all these state

Home | Main Index | Thread Index | Old Index