Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Killing a zombie process?



On Sun, 4 Oct 2015, Paul Goyette wrote:

 | 1. Is it correct for init's p_nstopchild to be zero when it has several
 |     children whose p_state is SSTOP?

Depends whether those children have previously been waited for or not.
Stopped children don't go away when they're waited for, so there needs
to be something to prevent wait() returning the same stopped child
over and over again.   That's p_waited ... so you need to check that
value of the stopped children, if it is 0, then something is broken.
If it is 1 (for all of them) then they're irrelevant, and matter not
at all.


Here's another instance of the problem. (Note that I'm limping along with crash(8) here since gdb isn't cooperating at the moment.)

crash> show proc 1
init: pid 1 proc fffffe810f46ecd0 vmspace/map fffffe810f483e60 flags 4001
  lwp 1 fffffe810f476a60 pcb fffffe810f464000
    stat 2 flags 8020000 cpu 0 pri 43
crash> x/x 0xfffffe810f46ecd0+0x130
fffffe810f46ee00:       0			p_nstopchild == 0
crash> x/x 0xfffffe810f46ecd0+0x100,2
fffffe810f46edd0:       7b5f5800    fffffe80	p_children listhead

Looking at the first child...

crash> x/x 0xfffffe807b5f5800+0xd0
fffffe807b5f58d0:       4			p_stat == SSTOP
crash>
fffffe807b5f58d4:       6f68			p_pid
crash> show proc 0x6f68
init: pid 28520 proc fffffe807b5f5800 vmspace/map fffffe807e7be480 flags 0
  lwp 1 fffffe811e636300 pcb fffffe81aae19000
    stat 2 flags 8020000 cpu 3 pri 43
crash> x/x 0xfffffe807b5f5800+0x134
fffffe807b5f5934:       0			p_waited == 0
crash> x/x 0xfffffe807b5f5800+0xf0,2
fffffe807b5f58f0:       f46e520     fffffe81	p_sibling.le_next

So, the first child of init appears to be another instance of init, and its state is SSTOP. It has not been waited for, yet its parent (the "real" init, pid=1) has a zero count for p_nstopchild.


This problem is easily reproduced, but only under heavy-load conditions. On a amd64 (CPU = Intel i5-4460 @ 3.20GHz) 7.99.21 I've been running a 'build.sh -j3 release' in parallel with a series of pkgsrc builds running with MAKE_JOBS=3; it takes from 30 to 60 minutes of this before the Zombie appears. (The pkgsrc builds are running in chroot created by pkgsrc/sysutils/mksandbox.)


+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+


Home | Main Index | Thread Index | Old Index