Re: Killing a zombie process?

To: current-users%netbsd.org@localhost
Subject: Re: Killing a zombie process?
From: Paul Goyette <paul%vps1.whooppee.com@localhost>
Date: Fri, 2 Oct 2015 15:26:42 +0800 (PHT)

On Fri, 2 Oct 2015, Paul Goyette wrote:

On Fri, 2 Oct 2015, Paul Goyette wrote:

For now, I took a quick look into the zombie's struct proc.

	p_exitsig = 0x14   = SIGCHILD
	p_flag    = 0x0
	p_sflag   = 0x2000 = PS_WEXIT
	p_slflag  = 0x0
	p_lflag   = 0x2    = PL_CONTROLT
	p_stflag  = 0x0
	p_stat    = 0x5    = SZOMB

	p_trace_enabled = 0x0
	p_pid     = 0x5280 = 21120 (the same value shown by ps)

I don't see anything unusual here.

I have attached the hex-dump in case anyone wants to look a little bitcloser.

OK, I forced a system crash (using ddb's sync command), and here's what gdbsays about the zombie's struct proc (manually inserted line breaks forimproved readability, and some flag value annotations)


(gdb) print (struct proc *) 0xfffffe81f578ba70
$1 = (struct proc *) 0xfffffe81f578ba70
(gdb) print *(struct proc *) 0xfffffe81f578ba70
$2 = {
 p_list = {le_next = 0x0, le_prev = 0xffffffff806be700 <zombproc>},
 p_auxlock = {u = {mtxa_owner = 0}},
 p_lock = 0xfffffe81fbb7a840,
 p_stmutex = {u = {mtxa_owner = 2049}},
 p_reflock = {rw_owner = 0},
 p_waitcv = {cv_opaque = {0x0, 0xfffffe81f578baa0, 0xffffffff804d542e}},
 p_lwpcv = {cv_opaque = {0x0, 0xfffffe81f578bab8, 0xffffffff804e7f9a}},
 p_cred = 0xfffffe81ef0106c0,
 p_fd = 0xfffffe810f46f680,
 p_cwdi = 0x0,
 p_stats = 0xfffffe81e00b5700,
 p_limit = 0xfffffe8155fe8de8,
 p_vmspace = 0xffffffff80722de0 <vmspace0>,
 p_sigacts = 0xfffffe803be9b258,
 p_aio = 0x0,
 p_mqueue_cnt = 0,
 p_specdataref = {
   specdataref_container = 0x0,
   specdataref_lock = {u = {mtxa_owner = 18446744073709551600}}},
 p_exitsig = 20,
 p_flag = 0,
 p_sflag = 8192 <PS_WEXIT>,
 p_slflag = 0,
 p_lflag = 2 <PL_CONTROLT>,
 p_stflag = 0,
 p_stat = 5 '\005' <SZOMB>,
 p_trace_enabled = 0 '\000',
 p_pad1 = "\203",
 p_pid = 21120,
 p_pglist = {
   le_next = 0x0,
   le_prev = 0xfffffe81eab655b0},
 p_pptr = 0xfffffe810f45ecd0,
 p_sibling = {
   le_next = 0xfffffe81f7618d20, le_prev = 0xfffffe81fc805108},
 p_children = {lh_first = 0x0},
 p_lwps = {lh_first = 0xfffffe8021ccb560},
 p_raslist = 0x0,
 p_nlwps = 1,
 p_nzlwps = 1,
 p_nrlwps = 0,
 p_nlwpwait = 0,
 p_ndlwps = 0,
 p_nlwpid = 1,
 p_nstopchild = 0,
 p_waited = 0,
 p_zomblwp = 0x0,
 p_vforklwp = 0x0,
 p_sched_info = 0x0,
 p_estcpu = 0,
 p_estcpu_inherited = 36864,
 p_forktime = 17842,
 p_pctcpu = 0,
 p_opptr = 0x0,
 p_timers = 0x0,
 p_rtime = {sec = 0, frac = 0},
 p_uticks = 0,
 p_sticks = 0,
 p_iticks = 0,
 p_traceflag = 0,
 p_tracep = 0x0,
 p_textvp = 0xfffffe81e6023190,
 p_emul = 0xffffffff806b6300 <emul_netbsd>,
 p_emuldata = 0x0,
 p_execsw = 0xffffffff808be0e0,
 p_klist = { slh_first = 0x0},
 p_sigwaiters = {lh_first = 0x0},
 p_sigpend = {
   sp_info = {tqh_first = 0x0, tqh_last = 0xfffffe81f578bc48},
   sp_set = {__bits = {0, 0, 0, 0}}},
 p_lwpctl = 0x0,
 p_ppid = 1,
 p_fpid = 0,
 p_sigctx = {
   ps_signo = 0, ps_code = 0, ps_lwp = 0, ps_sigcode = 0x0,

ps_sigignore = {__bits = {4294967295, 4294967295, 4294967295,4294967295}},

   ps_sigcatch = {__bits = {0, 0, 0, 0}}},
 p_nice = 20 '\024',
 p_comm = "sh\000ke", '\000' <repeats 11 times>,
 p_pgrp = 0xfffffe81eab655b0,
 p_psstrp = 140187732541408,
 p_pax = 0,
 p_xstat = 0,
 p_acflag = 1,
 p_md = {md_flags = 0, md_syscall = 0xffffffff8012f010 <syscall>},
 p_stackbase = 140187732541440,
 p_dtrace = 0x7f7ff683b8e6}

As far as I can tell, everything looks normal. Yet the process never getsreaped by init.

The one thing that surprises me here is that the zombie still has a pointerto p_textvp which would point to /bin/sh _within_ the chroot() sandbox(consistent with the p_comm = "sh" entry). I'm guessing that this referenceis what's preventing me from unmounting this nullfs mount. (I previouslyexpected the inability to unmount to be the result of a reference from thezombie's cwd.)


Still investigating, but I think I may have found something...

Using the p_pptr value 0xfffffe810f45ecd0 from the zombie's struct proc,I examined the struct proc for init. I followed the code from thefind_stopped_child() routine in src/sys/kern/kern_exit.c, and walkedthrough the loop for each of init's children. The first severalprocesses are all in p_state=4 (SSTOP), yet init's p_nstopchild count iszero!

This seems to cause the loop in find_stopped_child() to exit early (atline 790):


                 if (parent->p_nstopchild == 0 || child->p_pid == pid) {
                         child = NULL;
                         break;

(Here, parent points to init's struct proc, child is the struct procobtained from walking the p_children list, and pid is the argumentpassed to the wait4() syscall - init passes value WAIT_ANY, ie -1.)


Questions:

1. Is it correct for init's p_nstopchild to be zero when it has several
   children whose p_state is SSTOP?

2. Is the above code in init correct?  Should we really be leaving the
   loop when there are more children to examine?





+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+

References:
- Re: Killing a zombie process?
  - From: Robert Elz
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Gary Duzan
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Greg Troxel
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Robert Elz
- Re: Killing a zombie process?
  - From: Robert Elz
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette
- Re: Killing a zombie process?
  - From: Paul Goyette

Prev by Date: Re: Killing a zombie process?
Next by Date: daily CVS update output
Previous by Thread: Re: Killing a zombie process?
Next by Thread: Re: Killing a zombie process?
Indexes:

Home | Main Index | Thread Index | Old Index