Subject: Re: hang w/ high kern.maxvnodes
To: None <tech-kern@netbsd.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 09/13/2007 14:26:40
On Thu, Sep 13, 2007 at 09:17:02AM -0400, Allen Briggs wrote:
> I mentioned recently that I saw a hang with high kern.maxvnodes.
> I was trying to get a handle on it, so was increasing it, running
> pstat -T periodically, and running multiple builds.  I have 6GB
> of RAM in this machine, so had kern.maxvnodes cranked up to 2097000
> (just shy of 2M) before it seemed to wedge.  It seemed fine at
> 2M - 128K.  ^t would still work on some shells, but CTL-ALT-ESC
> didn't get me into ddb.  The shell running builds showed:
>  load: 12.00  cmd: sh 24585 [wait] 0.04u 0.00s 0% 1424k
> 
> In the shell running pstat -T, I seemed to be hung up on vm_map:
> load: 12.08  cmd: csh 19094 [vm_map] 0.03u 0.19s 0% 1300k
> load: 12.02  cmd: csh 19094 [vm_map] 0.03u 0.19s 0% 1300k
> load: 12.00  cmd: csh 19094 [vm_map] 0.03u 1.79s 0% 1300k
> 
> One thing that seems odd is that pstat apparently tries to grab
> all of the vnodes (even just for the -T output) and runs out of
> memory easily.

Yuk..
 
> I am guessing that something was locked looking for VM, but I
> don't know what, exactly.
> 
> This is running an amd64 GENERIC.MP kernel with dual Xeon 3.2GHz
> CPUs (with HT, so 4 virtual CPUs):
> cpu0 at mainbus0 apid 0: (boot processor)
> cpu0:                   Intel(R) Xeon(TM) CPU 3.20GHz, 3192.13 MHz
> cpu0: features: bffbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
> cpu0: features: bffbfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,B20,DS,ACPI,MMX>
> cpu0: features: bffbfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
> cpu0: features2: 641d<SSE3,MONITOR,DS-CPL,CID,xTPR>
> cpu0: features3: bffbfbff<SYSCALL/SYSRET,XD,EM64T>
> cpu0: L2 cache 2 MB 64B/line 8-way
> 
> Kernel is NetBSD 4.99.30 (GENERIC.MP) #2: Thu Aug 30 08:54:46 EDT 2007
> 
> Anyone seen anything similar?  Or have any suggestions on where
> to look?

It's basically impossible to debug amd64 GENERIC.MP unless you have compiled
with kernel with -fomit-frame-pointer. Is it a test machine only? If so
compile the kernel with "COPTS=" and use LOCKDEBUG and DIAGNOSTIC. Can you
get backtraces from the hung threads with ps/l and t/a? What does 'x/Lx
numvnodes' say?

Andrew