Subject: Re: is there anything i can do on a daemon not willing to die? (ignoreskill
To: None <Timothy.Musson@zin-tech.com>
From: Timo Schoeler <timo.schoeler@macfinity.net>
List: netbsd-help
Date: 05/04/2005 07:53:37
Timothy A. Musson wrote:
> Timo Schoeler wrote:
> 
>> Perry E. Metzger wrote:
>>
>>> Timo Schoeler <timo.schoeler@macfinity.net> writes:
>>>
>>>
>>>> Perry E. Metzger wrote:
>>>>
>>>>
>>>>> Timo Schoeler <timo.schoeler@macfinity.net> writes:
>>>>>
>>>>>
>>>>>
>>>>>> hi list,
>>>>>>
>>>>>> what can i do about a daemon that crashed in a way (clamsmtpd on
>>>>>> NetBSD 2.0.2-RELEASE/SPARC64) that not even a kill -9 `PID` got me 
>>>>>> rid
>>>>>> of it?
>>>>>
>>>>>
>>>>>
>>>>> If not even a kill -9 killed it, you have a kernel bug. What was the
>>>>> process sleeping in?
>>>>
>>>>
>>>> hm, it's running plain vanilla 2.0.2-RELEASE...
>>>>
>>>> the process was not sleeping, IIRC it was in state 'CPU' -- at least
>>>> (regarding to top) it consumed nearly 100% CPU, the systems' load was
>>>> between 4.5 and 5.2 for several hours...
>>>
>>>
>>>
>>> That doesn't make any sense at all.
>>>
>>> If a process is running in userland, you can always kill it. If a
>>> kill -9 does not work, then you are almost certainly stuck in the
>>> kernel.
>>
>>
>>
>> yes, that's what i thought, too. however, this scenario was very 
>> annoying :(
>>
>> the machine was responsible after all (sshd wasn't shut down yet, but it
>> didn't take any new logins of course), but it could not reboot because
>> of the process not willing to die. the machine was not stuck. that's
>> what makes it weird (at least to me).
>>
>>
>>> Can you show a ps of the process?
>>
>>
>>
>> i'll try to build up the same environment here on another machine so
>> that i can repeat it.
>>
>> thanks so far & cheers,
>>
> 
> Does the machine mount an NFS shared partition? (And, is it possible 
> that clamsmtpd was trying to access that NFS mount point but that mount 
> point had gone away without being unmounted?)

hi, no NFS, nothing remote mounted. the machine boots from a CF card 
(attached to the primary, on board IDE channel) and then RAIDframe comes 
in (consisting of two relatively big HDs attached to a PCI UATA133 
controller to circumvent the not very speedy on board controller).

but this runs very well, don't think that this has something to do with 
the crashing clamsmtpd, even as clamsmtpd is surely one of the apps that 
comsumes very little I/O (to/from HD)...

cheers,

timo