Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Killing a zombie process?



    Date:        Sun, 4 Oct 2015 17:25:21 +0800 (PHT)
    From:        Paul Goyette <paul%vps1.whooppee.com@localhost>
    Message-ID:  <Pine.NEB.4.64.1510041715370.15041%vps1.whooppee.com@localhost>

  | I'm pretty much convinced that the p_nstopchild accounting is screwed up 
  | somewhere.

I think I agree.

  | I'm planning on adding the following code in "optimization" 
  | in kern_exit so I can catch it as soon as it happens.

Sooner, but unfortunately, most probably not soon enough.

It is most likely some locking/race condition with multiple processes
dying at the same time (approximately) that is causing some of the
increments to be lost.   Making them all use atomic ops, instead of just ++
might fix the problem, at the cost of never discovering where issue
actually occurs - there should be locks around all manipulations of
this stuff, possibly one of them is missing or misplaced.

It is unlikely to be in the wait processing (at least not this one) as
there's just one process doing the waiting, there would be no contention
for the accesses here (it could be a combination of the two though,
wait() happening at the same instant a process is dying).

I'm also puzzled by your observations of forked init processes having
exited - after rc is finished, init generally only forks when one of the
console/terminal sessions ends, and a new getty needs to be started.
On most modern systems, that's a very rare event - though if you use
the console (ctl-alt-Fn or whatever it is) switching, and login and out
of those (virtual) terminals, it would happen.  Is there anything like
that in your environment?

kre



Home | Main Index | Thread Index | Old Index