tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

process mystery



I'm trying to understand how to unwedge my box without simply rebooting
it.  

After I got "cannot vfork" quite a few times, I noticed ttyp9 had dozens
of processes running utmp_update.  

$ ps -t ttyp9  | head -4
  PID TTY   STAT    TIME COMMAND
   16 ttyp9 I    0:00.01 utmp_update jklowden\\000\\000\\000\\000\\...
   21 ttyp9 I    0:00.01 utmp_update jklowden\\000\\000\\000\\000\\...
  416 ttyp9 I    0:00.01 utmp_update jklowden\\000\\000\\000\\000\\...

Each utmp_update had as its ppid another utmp_update on the same tty.
I don't know why that would be, so I just killed the whole lot of
them.  I noticed ttyp8 owned 27548, so I tried to log out of it, and
the xterm didn't close.  This leaves the following state:

$ ps -ax -o ppid,pid,tty,stat,command | sed -Ene '1p; /ttyp[89]|27548/p'
 PPID   PID TTY   STAT COMMAND
    1  5974 ttyp8 I    xless xless.man 
    1 27548 ttyp8 I    xterm -e emacs callbacks.c 
    1  3419 ttyp9 DE   (utmp_update)
27548  5078 ttyp9 DEs  (xterm)
 5078 28264 ttyp9 Z    (utmp_update)

What's the right way to clean this up?  There's no ttyp9 visible to log
out of (meaning, in no xterm does tty(1) print "ttyp9").  What does
state 'D' imply, other than, as the manual says, in an uninterruptible
state?  SIGTERM has (as expected) no effect on 3419, 5078, or 28264.  

I guess my only option is to send SIGKILL to the processes in DE state,
let init reap zombie 28264, and then send SIGTERM to the ttyp8
processes.  If you found things this way, would you investigate
anything else?  

Does the utmp_update cascade suggest anything?  

--jkl


Home | Main Index | Thread Index | Old Index