tech-kern: Re: Unkillable processes

Subject: Re: Unkillable processes
To: None <tech-kern@NetBSD.ORG, jon@oaktree.co.uk>
From: Wolfgang Solfrank <ws@kurt.tools.de>
List: tech-kern
Date: 11/15/1996 15:27:00
> As I understand it you can't kill a process if it's currently
> executing a syscall. Maybe the syscall is in the middle of
> updating something important, and if you killed it it might
> leave kernel data structures corrupt. To be able to kill
> a process at any point you'd need to make every syscall
> be able to tidy-up from itself at any point during its
> execution.
> 
> The above may be entirely wrong - I know nothing and
> I'm guessing ;-).

It is more or less wrong.

First off, a small bit of background:

The kernel in the (current) NetBSD implementation is not preemptable (like most
other U*X implementations).  I.e. if a process executes kernel code (e.g. in a
syscall), the processor doesn't switch to another process at will, but only if
the running process lets it (typically because it has to wait for some resource,
e.g. a disk block to complete I/O).

The call to suspend the current process (the function is called tsleep) has an
argument (actually a flag within the priority parameter) that tells it whether
to hold or not hold any signals that might get sent to the process during the
suspension (similar to the way sigblock does for signals in userland).

Processes that are unkillable are waiting for an event that doesn't happen,
but have the tsleep called with the argument to hold the signals.

This is in fact bad programming practice.  One should call tsleep with the
signal-hold option only if one is absolutely sure that the event will happen,
and will happen in the not too distant future.  Of course, if you call the
function without this option, you must be aware that it might return due to
a signal, not due to the availability of the resource you were waiting for.
So you have to orderly recover from the current state of the routine you're
in, free any other resources you claimed, and return to the user.

(And for the nit-pickers :-), I'm fully aware that the option is to catch
signals, not to hold them).

> I managed to get NetBSD 1.1 to deadlock the other day. I was
> copying a file to disk, and tried to 'ls' the disk at the
> same time (stupid me). Both processes got stuck in the 'D'
> state and I had to reboot. Ho hum.

There is nothing wrong with doing an 'ls' while copying a file to the same
directory.  If this results in a deadlock, there is a bug somewhere.  If you
can reproduce this with a more current system, I'm sure we'd all like to hear
it (probably "like" is not the correct word here :-)).

> Oh yes, while I'm here - if I 'ping -f', my (EIDE) hard drive
> locks up. (something about lost interrupts) Anybody know
> why? (Intel Atlantis motherboard)

Sorry, no idea here.

Ciao,
Wolfgang
--
ws@TooLs.DE     (Wolfgang Solfrank, TooLs GmbH) 	+49-228-985800