Subject: Re: killing the unkillable
To: Lord Isildur <mrfusion@uranium.vaxpower.org>
From: Andrew Brown <atatat@atatdot.net>
List: tech-kern
Date: 11/08/2002 12:21:11
>A common reason for unkillable processes is that the processes are 
>waiting on a syscall stuck in something that to userland must appear 
>atomic. consequently signals delivery is suspended under some 
>circumstances, and a process is essentially locked waiting on something 
>which likely will never happen. Normally this is harmless, the process will
>likely never do anything again. It might be holding locks or some state which
>other things are waiting for in turn, but given that the unkillable is 
>effectively stopped forever, you can usually safely just remove anything
>that that process is holding or controlling. In some cases if it easier to
>reboot, but i dont think of any processes offhand which would really be 
>so bad to just leave the unkillable one running. 

a more precise common reason for processes to be unkillable is that
they are stuck in disk wait.

>This is not the same as a zombie, those are processes which are in some 
>stages of exiting and have lost their parents along the way. Those are 
>eventually picked up by init and allowed to finish exiting. 

if their actual parent process simply exits, yes.  if not, they will
linger until the parent collects them, or forever if not.

>If you _Really_ want to wake these processes up, you can find the 
>specific address theyre sleeping on, cross your fingers, and write a 
>small program to wake them up. No guarantee that they will actually wake up
>since they have to check if the wakeup is for them, but it might jolt
>them loose again. Finding the channel theyre waiting on, though, might take
>some work, likely with a debugger on /dev/kmem. 

not so.

% ps -T -opid,wchan,nwchan,stat,command
  PID WCHAN    WCHAN STAT COMMAND
  310 pause cf798084 Ss   -csh (tcsh)
  528 -            0 R+   ps -T -opid 
29968 -            0 T    ssh -A cafebabe 

the numeric wait channel is what you want.  to wake up the sleeping
process, though, *will* require the use of the debugger.  or an lkm.
you can't simply call wakeup() from userland.

-- 
|-----< "CODE WARRIOR" >-----|
codewarrior@daemon.org             * "ah!  i see you have the internet
twofsonet@graffiti.com (Andrew Brown)                that goes *ping*!"
werdna@squooshy.com       * "information is power -- share the wealth."