tech-net: Re: problems with nmbcluster (?)

Subject: Re: problems with nmbcluster (?)
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: None <6bone@6bone.informatik.uni-leipzig.de>
List: tech-net
Date: 01/08/2007 07:01:40
On Sun, 7 Jan 2007, Manuel Bouyer wrote:

> Date: Sun, 7 Jan 2007 22:16:59 +0100
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: 6bone@6bone.informatik.uni-leipzig.de
> Cc: tech-net@NetBSD.org
> Subject: Re: problems with nmbcluster (?)
> 
> On Sun, Jan 07, 2007 at 08:28:10PM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
>> On Sun, 7 Jan 2007, Manuel Bouyer wrote:
>>
>>> Date: Sun, 7 Jan 2007 19:09:59 +0100
>>> From: Manuel Bouyer <bouyer@antioche.eu.org>
>>> To: 6bone@6bone.informatik.uni-leipzig.de
>>> Cc: tech-net@NetBSD.org
>>> Subject: Re: problems with nmbcluster (?)
>>>
>>> On Sun, Jan 07, 2007 at 05:44:48PM +0100,
>>> 6bone@6bone.informatik.uni-leipzig.de wrote:
>>>> hello,
>>>>
>>>> I have some problems with the network. I have to restart my server
>>>> continuously, because after some days the server loses all connection to
>>>> the network. You cannot establish any connections or do any pings. You can
>>>> only restart the server. After the restart everything works fine for some
>>>> days.....
>>>>
>>>> I have tested some kernels (3.0, 3.1, current....) but always the same
>>>> effect occurs. On the server runs no special service. Only apache2 and
>>>> postgresql from the pkgsrc. I don't know why the problem only occurs at my
>>>> system. It is a dual i386/PIII with enabled IPv6 and an intel nic.
>>>>
>>>> I cannot give you more special hints. Only one output from 'netstat -mss'
>>>> after the connection was lost:
>>>>
>>>> 1441 mbufs in use:
>>>>         1150 mbufs allocated to data
>>>>         291 mbufs allocated to packet headers
>>>> 132521 calls to protocol drain routines
>>>>
>>>>
>>>> Can anyone give me a hint for a possible solution or workaround? The
>>>> continuous restarts are not longer possible. I have already exchanged the
>>>> complete hard- and software.
>>>
>>> What does 'vmstat -m|grep mclpl' shows ?
>>>
>>> --
>>> Manuel Bouyer <bouyer@antioche.eu.org>
>>>    NetBSD: 26 ans d'experience feront toujours la difference
>>> --
>>>
>>
>> the uptime at the moment is only 4h - so I can only report the actual
>> output:
>>
>> netstat -mss && vmstat -m|grep mclpl
>>
>> 1497 mbufs in use:
>>         1110 mbufs allocated to data
>>         387 mbufs allocated to packet headers
>> 34 calls to protocol drain routines
>>
>> vmstat: Kmem statistics are not being gathered by the kernel.
>> mclpl       2048     1578    0      938   408    74   334   398     4   512
>
> I suspect your system is running out of mclpl on occasion, and this cause the
> network atapter (or the IP stack) to stall. Try bumping nmbclusters.
>
> For example on ftp.fr.netbsd.org I have it set to 8192.
>
> -- 
> Manuel Bouyer <bouyer@antioche.eu.org>
>     NetBSD: 26 ans d'experience feront toujours la difference
> --
>

I have already testet with NMBCLUSTERS=4096. I think the system runs some 
days longer until the stall occurs. Now I will test with 8192, but I think 
it will not solve the problem.