Subject: Re: NetBSD locking on mclpool error
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: Chris G. Demetriou <cgd@netbsd.org>
List: netbsd-help
Date: 09/22/1999 09:52:14
Manuel Bouyer <bouyer@antioche.lip6.fr> writes:
> On Wed, Sep 22, 1999 at 11:21:00AM -0500, John A. Maier wrote:
> > I've got two machines running NetBSD 1.4.1 i386.
> >
> > The server running apache web server was locked up the other day with the
> > error mclpool limit reached: increase NMBCLUSTERS
>
> This means there's not enouth resources for the network protocols.
>
> >
> > I thought it might be a fluk, then on a completely different machine, I
> > was working under X and it locked up. I noticed that that same message
> > was on the console of that machine.
> >
> > What's wrong, what do I need to do?
>
> Increase NMBCLUSTERS :) see options(4), you need to recompile a kernel.
Uh, it's not just that the machine was under-configured.
There's an annoying problem that showed up going from 1.3.x to 1.4.x
(and ep -> ex) on, e.g., my server at home. With 1.3.x it wouldn't
ever lose for lack of mbuf clusters. With 1.4.x it would,
consistently.
In a nutshell, this is due to the way clusters are used by many
drivers, especially DMA-using drivers like 'ex', etc.
There's been a workaround for it in -current for a while, and i pulled
it up to the 1.4.x branch last night.
The relevant release branch CHANGES-1.4.x file entry is:
> sys/kern/uipc_socket2.c 1.31-1.33
>
> Compact mbuf clusters, to help prevent mbuf cluster exhaustion when
> receiving lots of small packets. This costs some performance (the
> compaction copies data), but adds a lot of stability to many systems.
(i.e. the changes in revisions 1.31, 1.32, and 1.33 were pulled up to
the branch.)
It's by no means a perfect fix, but it improves things a lot. (There
are situations that can still trigger the problems, but they're much
more rare.)
I found that on one server-ish system, which handles some interactive
traffic, some CVS, and some NFS, the machine would overflow 256 mbuf
clusters consistently (like, every few hours since I added the 'ex'
card), and some of those times would end up being fatal -- or at least
seeming fatal. I upped the count of mbuf clusters to 512, then 1024,
and it did _NOT_ solve the problem. (The frequency of the occurrences
was reduced, but not eliminated.) Running with the patch above, there
has been no problem, nor has the machine exceeded even _256_ mbuf
clusters.
Moral of the story: some times the diganostics mean that you should
increase the limits. Sometimes they mean the system's broken.
cgd
--
Chris Demetriou - cgd@netbsd.org - http://www.netbsd.org/People/Pages/cgd.html
Disclaimer: Not speaking for NetBSD, just expressing my own opinion.