Re: Killing a zombie process?

To: Robert Elz <kre%munnari.OZ.AU@localhost>
Subject: Re: Killing a zombie process?
From: Paul Goyette <paul%vps1.whooppee.com@localhost>
Date: Sun, 4 Oct 2015 20:52:43 +0800 (PHT)

On Sun, 4 Oct 2015, Robert Elz wrote:

   Date:        Sun, 4 Oct 2015 17:25:21 +0800 (PHT)
   From:        Paul Goyette <paul%vps1.whooppee.com@localhost>
   Message-ID:  <Pine.NEB.4.64.1510041715370.15041%vps1.whooppee.com@localhost>

 | I'm pretty much convinced that the p_nstopchild accounting is screwed up
 | somewhere.

I think I agree.

 | I'm planning on adding the following code in "optimization"
 | in kern_exit so I can catch it as soon as it happens.

Sooner, but unfortunately, most probably not soon enough.

It is most likely some locking/race condition with multiple processes
dying at the same time (approximately) that is causing some of the
increments to be lost.   Making them all use atomic ops, instead of just ++
might fix the problem, at the cost of never discovering where issue
actually occurs - there should be locks around all manipulations of
this stuff, possibly one of them is missing or misplaced.

Yeah, I think that there's a basic accounting problem somewhere, andwith an extreme load it is more likely for the SSTOPed process to getinserted in the p_children/p_sibling list before the SZOMB process canget reaped. Once the SSTOPed process gets to front-of line (with theparent's p_nstopchild count zero), the SZOMB process won't ever getprocessed. My patch will simply validate this theory.

(BTW, the patch is actually wrong, as it would also panic in the casewhere the wait was for a specific pid. I've modified it in my newkernel - not yet tested.)

It is unlikely to be in the wait processing (at least not this one) as
there's just one process doing the waiting, there would be no contention
for the accesses here (it could be a combination of the two though,
wait() happening at the same instant a process is dying).


See above.

I'm also puzzled by your observations of forked init processes having
exited - after rc is finished, init generally only forks when one of the
console/terminal sessions ends, and a new getty needs to be started.
On most modern systems, that's a very rare event - though if you use
the console (ctl-alt-Fn or whatever it is) switching, and login and out
of those (virtual) terminals, it would happen.  Is there anything like
that in your environment?

I do occassionally switch to another wsdisplay screen (away from the Xone), but not frequently. I definitely do a switch before I useCtrl/Alt/Esc to get into ddb.

I'm wondering if some (most? all?) of the SSTOPd processes I see are aresult of entering ddb and/or triggering the reboot? Doesn't ddb needto stop whatever is running on "the other CPU cores" ?




+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+

References:
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Gary Duzan
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Greg Troxel
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Robert Elz
- Re: Killing a zombie process?
  - From: Robert Elz
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Robert Elz
- Re: Killing a zombie process?
  - From: Robert Elz

Prev by Date: Re: Problems with gdb?
Next by Date: Re: Killing a zombie process?
Previous by Thread: Re: Killing a zombie process?
Next by Thread: Re: Killing a zombie process?
Indexes:

Home | Main Index | Thread Index | Old Index