Subject: kern/35586: apparent NMBCLUSTERS sysctl related problem on netbsd-3 branch
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <mm_lists@pulsar-zone.net>
List: netbsd-bugs
Date: 02/12/2007 01:40:00
>Number:         35586
>Category:       kern
>Synopsis:       apparent NMBCLUSTERS sysctl related problem on netbsd-3 branch
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 12 01:40:00 +0000 2007
>Originator:     Matthew Mondor
>Release:        NetBSD 3.1_STABLE
>Organization:
Pulsar-Zone
>Environment:
NetBSD hal.xisop 3.1_STABLE NetBSD 3.1_STABLE (GENERIC_MM) #7: Sun Feb 11 08:15:37 EST 2007  root@hal.xisop:/usr/src/sys/arch/i386/compile/GENERIC_MM i386
>Description:
One of my systems under rather heavy network load (up to 800
concurrent TCP connections, and using a remote NFS server for
most storage) would often need to be rebooted for the network
to work again.

All local and remote network activity would stop and I could
observe processes locked in the netio and mclpl wait channels.
Send-Q high as viewed by netstat and nearly 1024 mbufs shown
by netstat -m.

However, I was not getting any "please raise NMBCLUSTERS" as
with former releases, and noticed that there now was a sysctl
to raise it.  Attempting to raise it via sysctl fails with a
permission denied error even as the superuser for some reason.
I thus modified my kernel configuration adding
NMBCLUSTERS=2048, compiled and ran that kernel and the network
locking problems seem to have stopped.

The network card is an fxp(4), and ifconfig up/down would work
transparently but would not allow networking to resume.  Since
the problem also occurs with lo(4), I suspect that it isn't an
ethernet device driver bug.  Other than networking the rest of
the system appeared fully functional, and I could login via the
serial tty.

Would this be a new problem possibly introduced by sysctl
nmbclusters pullup to netbsd-3 from -current?  I have not
investigated the code much other than noticing that mclpl
wchan was related to an mbuf allocation pool.

Thanks,
Matt
>How-To-Repeat:
A large number of active TCP connections with low NMBCLUSTERS setting.
>Fix:
-