Subject: The kmem_map issue, revisited
To: None <port-sparc@NetBSD.org>
From: John D. Baker <jdbaker@mylinuxisp.com>
List: port-sparc
Date: 10/05/2005 14:52:14
I've been reading everything in the mailing list archives on this topic,
and I'm still trying to wrap my brain around the problem and why it happens
on some machines, but not others.

My understanding, so far, is that on machines with more than some
threshhold amount of memory, the 'free list' indicates that there
is memory available, but the kernel has insufficient map entries
to point to it.  Further, there is indication that the map size is
to some extent dynamically sized when the kernel is initialized and
that the min and max limits can be adjusted at build time (NKMEMPAGES_MIN
and NKMEMPAGES_MAX).

I haven't really been able to glean what that threshhold memory level is,
the machines on which it will appear, and the kind of activity that's likely
to trigger the panic.  But I'd like to submit my own datapoints and
observations for discussion and feedback.

I have a pile of SS5s with varying amounts of memory, the largest being
a SS5-110 with 256MB of memory.  These machines have _never_ elicited
an "out of space in kmem_map" panic (or the related scsi-command block
allocation failure).  They run either GENERIC kernels or customized
kernels in which kmem_map size has been left to the defaults.  The smaller
machines are my firewall, mail/network services, and fileserver machines,
respectively.  The "big" SS5 mentioned above typically is pressed into
service to perform CVS updates and build said custom kernels.

I am trying to bring up an SS20 with 512MB memory and dual 150MHz ROSS
HyperSPARC CPUs.  This machine, like many of those mentioned in the
various related threads, is being a serious pain when it comes to the
kmem_map issue.  With the default values, performing a CVS update of a
large respository like 'src' or 'pkgsrc' will panic the machine in a
couple of minutes.  Doubling the kmem_map size (1536->3072) was not
sufficient, although it took much longer for a CVS update to panic
the machine.

(As I write this, using a uniprocessor kernel with NKMEMPAGES_MAX=6144
has just successfully performed a CVS update of pkgsrc while simultaneously
building a MP version of the same kernel.)

From what I read in the archives, someone had SS10 system that had
problems with as little as 128MB of memory with the default kmem_map
size.

The plan for my SS20 is to eventually be my next fileserver/email
host/etc. system (replacing 2 of the SS5s mentioned above), so I want
to make sure it's not going to flake out under load.

-- 
John D. Baker, KN5UKS                    NetBSD     Darwin/MacOS X
jdbaker(at)mylinuxisp(dot)com                 OpenBSD            FreeBSD
BSD -- It just sits there and _works_!