Subject: Re: NetBSD locking on mclpool error
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: Chris G. Demetriou <cgd@netbsd.org>
List: netbsd-help
Date: 09/22/1999 09:52:14
Manuel Bouyer <bouyer@antioche.lip6.fr> writes:
> On Wed, Sep 22, 1999 at 11:21:00AM -0500, John A. Maier wrote:
> > I've got two machines running NetBSD 1.4.1 i386.
> > 
> > The server running apache web server was locked up the other day with the   
> > error mclpool limit reached: increase NMBCLUSTERS
> 
> This means there's not enouth resources for the network protocols.
> 
> > 
> > I thought it might be a fluk, then on a completely different machine, I   
> > was working under X and it locked up.  I noticed that that same message   
> > was on the console of that machine.
> > 
> > What's wrong, what do I need to do?
> 
> Increase NMBCLUSTERS :) see options(4), you need to recompile a kernel.

Uh, it's not just that the machine was under-configured.

There's an annoying problem that showed up going from 1.3.x to 1.4.x
(and ep -> ex) on, e.g., my server at home.  With 1.3.x it wouldn't
ever lose for lack of mbuf clusters.  With 1.4.x it would,
consistently.

In a nutshell, this is due to the way clusters are used by many
drivers, especially DMA-using drivers like 'ex', etc.

There's been a workaround for it in -current for a while, and i pulled
it up to the 1.4.x branch last night.

The relevant release branch CHANGES-1.4.x file entry is:

> sys/kern/uipc_socket2.c                         1.31-1.33
>
>   Compact mbuf clusters, to help prevent mbuf cluster exhaustion when
>   receiving lots of small packets.  This costs some performance (the
>   compaction copies data), but adds a lot of stability to many systems.

(i.e. the changes in revisions 1.31, 1.32, and 1.33 were pulled up to
the branch.)

It's by no means a perfect fix, but it improves things a lot.  (There
are situations that can still trigger the problems, but they're much
more rare.)


I found that on one server-ish system, which handles some interactive
traffic, some CVS, and some NFS, the machine would overflow 256 mbuf
clusters consistently (like, every few hours since I added the 'ex'
card), and some of those times would end up being fatal -- or at least
seeming fatal.  I upped the count of mbuf clusters to 512, then 1024,
and it did _NOT_ solve the problem.  (The frequency of the occurrences
was reduced, but not eliminated.)  Running with the patch above, there
has been no problem, nor has the machine exceeded even _256_ mbuf
clusters.

Moral of the story: some times the diganostics mean that you should
increase the limits.  Sometimes they mean the system's broken.



cgd
-- 
Chris Demetriou - cgd@netbsd.org - http://www.netbsd.org/People/Pages/cgd.html
Disclaimer: Not speaking for NetBSD, just expressing my own opinion.