Subject: Unkillable process, stalled socket write()
To: None <netbsd-users@netbsd.org>
From: Jorgen Lundman <lundman@lundman.net>
List: netbsd-users
Date: 05/27/2004 10:55:30
NetBSD mirror 1.6ZF NetBSD 1.6ZF (mirror) #6: Fri Apr  2 04:06:49 CEST 2004 
root@mirror:/usr/src/sys-current/src/sys/arch/i386/compile/mirror i386

Most likely is something I have done in my software, but it is behaving unusual.
FTPD process I have is now hung. Kill -9 does nothing to it, and naturally I can 
not release it.

Usually when I see this, it is usually due to disk or tape going bad and the 
kernel will block forever. But what is unusual is that this time the blocked fd 
is a socket, that the FTPd is sending to.

However, gdb tells me:

#0  0x481d36d7 in write () from /usr/lib/libc.so.12
#1  0x808bb2b in sockets_write (fd=275, [cut]

Inspecting my structures I can confirm that fd 275 is a socket, we already have 
read 4072 bytes from the file on disk, and are now trying to send them.


0x481d36d7 in write () from /usr/lib/libc.so.12
(gdb) disass
Dump of assembler code for function write:
0x481d36d0 <write>:     mov    $0x4,%eax
0x481d36d5 <write+5>:   int    $0x80
0x481d36d7 <write+7>:   jb     0x481d36b8 <getpid+8>
0x481d36d9 <write+9>:   ret

int 0x80 at a guess is just a syscall, and 0x4 would be sys_write().

fd 275 is also in nonblocking mode, so even if it was that it is out of mbufs or 
memory, should it not always return, even with a failure?

Memory: 278M Act, 39M Inact, 608K Wired, 10M Exec, 291M File, 476M Free 
Swap: 10G Total, 10G Free

USER   PID %CPU %MEM  VSZ   RSS TT STAT STARTED      TIME COMMAND
root 15529  0.0  0.0 8216     4 ?? DXs  11:03AM  67:08.89 ./lundftpd

sysstat mbufs
           /0   /5   /10  /15  /20  /25  /30  /35  /40  /45  /50  /55  /60
data      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 44023
headers   XXXXXXXXXXXXXXXXXXXXXXXXXX

Alas, netstat, vmstat don't run since userland is 1.6.2 and kernel is -current 
(to support the nic) sigh.


Lund


-- 
Jorgen Lundman       | <lundman@lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)