NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/50318: Process p_nstopchild counter not updated in suspendsched()
The following reply was made to PR kern/50318; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: kern/50318: Process p_nstopchild counter not updated in suspendsched()
Date: Sat, 10 Oct 2015 14:24:53 +0000
The analysis sounds plausible to me, but it also seems as though we
ought to
(a) audit all accounting of p_nstopchild, and
(b) at least identify in comments, if not formally name, every state
transition that affects it.
A cursory examination of the grep of p_nstopchild suggests it counts
- SSTOP children that have not been waited, and
- SDEAD/SZOMB children.
So there are a few state transitions that we must worry about:
1. Child changes p_stat to {SSTOP, SDEAD, SZOMB}.
2. Child changes p_stat from {SSTOP, SDEAD, SZOMB}.
3. Child is waited while stopped.
4. Child is transferred from one parent to another.
The following updates of p_nstopchild currently exist:
kern_exec.c:1289: p->p_pptr->p_nstopchild++;
Transition 1, implements proc.12345.stopexec to stop on exec (*),
subject of PR kern/50289 concerning p_waited.
kern_exit.c:496: old_parent->p_nstopchild++;
Transition 1, when process exits and goes to SDEAD before SZOMB.
XXX Earlier in exit1, there is logic for proc.12345.stopexit to
stop on exit (*) which transitions p to SSTOP, but it doesn't
adjust p_pptr->p_nstopchild.
kern_exit.c:786: parent->p_nstopchil=
d--;
Transition 3, implements the part of wait that picks a child.
kern_exit.c:886: parent->p_nstopchild--;
Transition 2, when zombie process is freed and disassociated from
its parent.
kern_exit.c:964: child->p_pptr->p_nstopchild--;
kern_exit.c:965: parent->p_nstopchild++;
Transition 4, in proc_reparent.
kern_fork.c:564: p1->p_nstopchild++;
Transition 1, implements proc.12345.stopfork to stop on fork (*).
kern_lwp.c:514: p->p_pptr->p_nstopchild--;
Transition 2, restarts stopped lwp. Not clear to me what the
original p_stat is, but it is set to SACTIVE, so none of {SSTOP,
SDEAD, SZOMB}.
kern_sig.c:1373: p->p_pptr->p_nstopchild--;
Transition 2, restarts stopped process on SIGCONT.
kern_sig.c:1474: p->p_pptr->p_nstopchild++;
Transition 1, stops process on SIGSTOP.
kern_sig.c:2192: p->p_pptr->p_nstopchild--;
Transition 2, restarts stopped process using ptrace.
(*) It looks like there is some duplicated logic to stop in various
circumstances for proc.12345.stop{exec,exit,fork}. The differences
are worth scrutinizing further, but are not likely to be relevant to
your unreapable zombie problem.
The following updates of p_stat currently exist:
kern_exec.c:1296: p->p_stat =3D SSTOP;
Transition 1, proc.12345.stopexec (PR kern/50298), does
p->p_pptr->p_nstopchild++.
kern_exec.c:2073: l->l_proc->p_stat =3D SSTOP;
kern_exec.c:2144: l->l_proc->p_stat =3D ostat;
posix_spawn, no obvious update of l->l_proc->p_pptr->p_nstopchild,
which looks wrong to me, but it is restored shortly afterward.
This looks like a sketchy bookkeeping kludge, not actually stopping
a process.
kern_exec.c:2552: p2->p_stat =3D SACTIVE;
posix_spawn, new process (SIDL), can't be transition 2.
kern_exit.c:232: p->p_stat =3D SSTOP;
Transition 1, proc.12345.stopexit, XXX looks like it's missing
p->p_pptr->p_nstopchild++ (PR kern/50308).
kern_exit.c:250: p->p_stat =3D SDYING;
kern_exit.c:492: p->p_stat =3D SDEAD;
kern_exit.c:556: p->p_stat =3D SZOMB;
Transition 1, process exit, does p->p_pptr->p_nstopchild++
alongside the p->p_stat =3D SDEAD transition.
XXX I assume it must not be stopped beforehand, so that it is not
relevant whether p was waited, but I have not proven this
assumption.
XXX It does not appear that p->p_stat =3D SDEAD happens with
p->p_lock held, which looks wrong. What is the lock order for
processes p_lock of different processes? Ancestor first or
descendant first?
kern_exit.c:884: p->p_stat =3D SIDL; /* not even a zom=
bie any more */
Transition 2, proc_free, does p->p_pptr->p_nstopchild--. Since it
is always a transition from SZOMB, no need to check whether p is
waited.
kern_fork.c:562: p2->p_stat =3D SSTOP;
Transition 1, proc.12345.stopfork, does p2->p_pptr->p_nstopchild++.
Since it is always a new process (SIDL), no need to check whether
p2 is waited.
kern_fork.c:570: p2->p_stat =3D SACTIVE;
In fork, new process (SIDL), can't be transition 2.
kern_lwp.c:510: p->p_stat =3D SACTIVE;
Transition 2, restart stopped lwp, does p->p_pptr->p_nstopchild--
if p is not waited.
kern_proc.c:739: p->p_stat =3D SIDL; /* protec=
t against others */
Newly allocated process, can't be transition 2.
kern_sig.c:1370: p->p_stat =3D SACTIVE;
Transition 2, SIGCONT/SIGKILL to continue a stopped process, does
p->p_pptr->p_nstopchild-- if p is not waited.
kern_sig.c:1472: p->p_stat =3D SSTOP;
Transition 1, stops process on SIGSTOP, does
p->p_pptr->p_nstopchild++.
kern_sig.c:2187: p->p_stat =3D SACTIVE;
Transition 2, restarts stopped process using ptrace, does
p->p_pptr->p_nstopchild-- if p is not waited.
kern_synch.c:988: p->p_stat =3D SSTOP;
Transition 1, finally the case you noted here.
From the above analysis, your patch looks correct to me. But I think
we nevertheless ought to write down formal names for all these state
transitions.
Home |
Main Index |
Thread Index |
Old Index