Subject: Re: Unkillable process, stalled socket write()
To: Christos Zoulas <christos@zoulas.com>
From: Jorgen Lundman <lundman@lundman.net>
List: netbsd-users
Date: 03/10/2005 20:59:00
> So you have run out of SOMAXKVA which is 16M and it is defined in
> uipc_socket.c Something is not cleaning up after itself, so you
> are running out socket space...  Or you are legitimately running
> out. The two children are zombies, and nobody has waited for them
> yet... What do you mean just wait4()ing? The parent should wait4()
> them, but it is stuck in D state. What is the parent writing to?
> 


Thanks for the input.  I'm not concerned about the zombies, if the parent wasn't 
locked in the write() call, they would have been wait4()ed on. But SIGCHLD, 
heck, SIGKILL is just "ignored", so that isn't happening.

However, what is SOMAXKVA? We do moderate amount of traffic, but not enough to 
justify it running out legitimately I think. If my program isn't cleaning up 
after itself, what would that be? I do not leak fd's or memory according to 
Purify etc.

Just thinking that if I increase SOMAXKVA, all I do is delay the bug (which 
could be ok, if it only happens every ~120days at the moment.

I was hunting a different bug in OpenSSL when this happened. (So I restarted my 
application many times).

Anyone else come across the issue that SSL_write turns fds into blocking mode? 
When I add printfs before and after the call, I confirm fd is nonblocking before 
I call SSL_write(), but it is set to _blocking_ after the call! (repeatedly, but 
only for one (very slow) user. but this is better asked on openssl mailinglists).

Thanks for your help,

Lund


-- 
Jorgen Lundman       | <lundman@lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)