Subject: Re: Unkillable process, stalled socket write()
To: Christos Zoulas <firstname.lastname@example.org>
From: Jorgen Lundman <email@example.com>
Date: 03/10/2005 19:17:54
Sorry for the delay, it does not happen that often.
pkill -INT lundftpd
0 4021 6423 0 -22 0 0 0 - ZW ?? 0:00.00 (lundftpd)
0 6423 1 0 -18 0 2520 2684 sokva Ds ?? 0:16.06 ./lundftpd
0 9550 6423 0 -22 0 0 0 - ZW ?? 0:00.00 (lundftpd)
The middle process, the other two are children, just wait4()ing.
#0 0x48187dc3 in write () from /usr/lib/libc.so.12
#1 0x080cbc76 in BIO_sock_non_fatal_error ()
(hmm? a clue? called via SSL_write if that makes any difference)
I can not get write() to exit.
kill -9 6423
0 6423 1 0 -18 0 2520 2680 sokva Ds ?? 0:16.06 ./lundftpd
0x48187dc3 in write () from /usr/lib/libc.so.12
The socket it supposed to be nonblocking, but even if it wasn't - should this
happen? (Dead disks I have seen before but this is a network socket).
Dump of assembler code for function write:
0x48187dbc <write>: mov $0x4,%eax
0x48187dc1 <write+5>: int $0x80
0x48187dc3 <write+7>: jb 0x48187da4 <writev+12>
0x48187dc5 <write+9>: ret
0x48187dc6 <write+10>: nop
0x48187dc7 <write+11>: nop
0x48187dc8 <write+12>: push %ebx
0x48187dc9 <write+13>: call 0x48187dce <write+18>
0x48187dce <write+18>: pop %ebx
0x48187dcf <write+19>: add $0x861fa,%ebx
0x48187dd5 <write+25>: mov 0xc24(%ebx),%ecx
0x48187ddb <write+31>: pop %ebx
0x48187ddc <write+32>: jmp *%ecx
0x48187dde <write+34>: mov %esi,%esi
End of assembler dump.
reboot -d this time.
Christos Zoulas wrote:
> In article <422BAC46.firstname.lastname@example.org>,
> Jorgen Lundman <email@example.com> wrote:
>>I can now confirm this happens in NetBSD-2.0 as well. Nonblocking listen(2)
>>socket, that stalls forever. kill -9 does not terminate process. If I want that
>>port back, I have to reboot. (Unless there is some way I can hack/claim it back
>>with gdb+kernel?). The rest of the machine is just fine.
>>deadlocked due to IO when its a disk I understand, but the same based on
>>Still only happens every 40 days or so, so it is hard to track down. I can not
>>run ktrace that long since it logs all the data as well. (Can one tell
>>only record the call, not the data?)
>>Could I force a dump at this time, since the machine itself it working fine, to
>>further investigate the trouble?
> What wait channel is the process stuck on (ps -axl)?
Jorgen Lundman | <firstname.lastname@example.org>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)