process mystery

To: tech-userlevel%netbsd.org@localhost
Subject: process mystery
From: "James K. Lowden" <jklowden%schemamania.org@localhost>
Date: Fri, 25 Mar 2016 21:47:55 -0400

I'm trying to understand how to unwedge my box without simply rebooting
it.  

After I got "cannot vfork" quite a few times, I noticed ttyp9 had dozens
of processes running utmp_update.  

$ ps -t ttyp9  | head -4
  PID TTY   STAT    TIME COMMAND
   16 ttyp9 I    0:00.01 utmp_update jklowden\\000\\000\\000\\000\\...
   21 ttyp9 I    0:00.01 utmp_update jklowden\\000\\000\\000\\000\\...
  416 ttyp9 I    0:00.01 utmp_update jklowden\\000\\000\\000\\000\\...

Each utmp_update had as its ppid another utmp_update on the same tty.
I don't know why that would be, so I just killed the whole lot of
them.  I noticed ttyp8 owned 27548, so I tried to log out of it, and
the xterm didn't close.  This leaves the following state:

$ ps -ax -o ppid,pid,tty,stat,command | sed -Ene '1p; /ttyp[89]|27548/p'
 PPID   PID TTY   STAT COMMAND
    1  5974 ttyp8 I    xless xless.man 
    1 27548 ttyp8 I    xterm -e emacs callbacks.c 
    1  3419 ttyp9 DE   (utmp_update)
27548  5078 ttyp9 DEs  (xterm)
 5078 28264 ttyp9 Z    (utmp_update)

What's the right way to clean this up?  There's no ttyp9 visible to log
out of (meaning, in no xterm does tty(1) print "ttyp9").  What does
state 'D' imply, other than, as the manual says, in an uninterruptible
state?  SIGTERM has (as expected) no effect on 3419, 5078, or 28264.  

I guess my only option is to send SIGKILL to the processes in DE state,
let init reap zombie 28264, and then send SIGTERM to the ttyp8
processes.  If you found things this way, would you investigate
anything else?  

Does the utmp_update cascade suggest anything?  

--jkl

Follow-Ups:
- Re: process mystery
  - From: Robert Elz

Prev by Date: Re: strerror(0) POSIX compliance
Next by Date: Re: process mystery
Previous by Thread: strerror(0) POSIX compliance
Next by Thread: Re: process mystery
Indexes:

Home | Main Index | Thread Index | Old Index