tech-net: Re: problems with nmbcluster (?)

Subject: Re: problems with nmbcluster (?)
To: None <tech-net@NetBSD.org>
From: David Young <dyoung@pobox.com>
List: tech-net
Date: 01/11/2007 04:22:55

On Sun, Jan 07, 2007 at 05:44:48PM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
> hello,
> 
> I have some problems with the network. I have to restart my server 
> continuously, because after some days the server loses all connection to 
> the network. You cannot establish any connections or do any pings. You can 
> only restart the server. After the restart everything works fine for some 
> days.....
> 
> I have tested some kernels (3.0, 3.1, current....) but always the same 
> effect occurs. On the server runs no special service. Only apache2 and 
> postgresql from the pkgsrc. I don't know why the problem only occurs at my 
> system. It is a dual i386/PIII with enabled IPv6 and an intel nic.
> 
> I cannot give you more special hints. Only one output from 'netstat -mss' 
> after the connection was lost:
> 
> 1441 mbufs in use:
>          1150 mbufs allocated to data
>          291 mbufs allocated to packet headers
> 132521 calls to protocol drain routines

I am going to take a wild guess that apache does not read its sockets
fast enough to keep its socket queues from growing long tails of mbufs,
and then apache tries to do a blocking write(2) on a socket before
read(2)ing its sockets.  If apache tries to write(2) when all mbufs are
either on its receive queues or on wm's receive ring, it seems to me
that the system will deadlock.

Dave

-- 
David Young             OJC Technologies
dyoung@ojctech.com      Urbana, IL * (217) 278-3933