Subject: Re: 7GB memtest causes hard system hang.
To: None <tls@rek.tjls.com>
From: Greg Oster <oster@cs.usask.ca>
List: port-amd64
Date: 01/16/2006 11:33:08
Thor Lancelot Simon writes:
> 
> The message you quoted _most of_ described exactly how to reproduce the
> problem.  Boot a 3.0 GENERIC.MP kernel on a dual-Opteron system with 8GB
> of RAM.   Next, unlimit datasize, memoryuse, and memorylocked. 
> Next,
> run "memtester" from pkgsrc -- I see I accidentally called it "memtest",
> but since that's the only memory tester we have that runs under NetBSD
> and it's also the one everyone uses, I hardly think that's too hard to
> figure out -- specifying a 7GB test size;

So running 'memtester 7500' gave me:
memtester 7500
memtester version 4.0.5 (64-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 7500MB (7864320000 bytes)
got  7500MB (7864320000 bytes), trying mlock ...failed for unknown reason.
Continuing with unlocked memory; testing will be slower and less reliable.
Loop 1:
  Stuck Address       : testing   0

and seemed to be working:

  661 root      -5    0  7500M 7176M biowai/3   0:30  4.25%  4.25% memtester

at least for the kernel I was running at the time (more below)...

> it's the only command line option
> memtester takes, and it's mandatory.  Next, watch your system hang so hard
> that you can't get it to drop to DDB.

I initially ran this with a kernel where I've tweaked

options NKMEMPAGES=150000

in the kernel config, and couldn't reproduce the problem at all.

When I actually paid a bit more attention to the details, and ran 
a GENERIC.MP kernel, here's the panic I get on the console:

panic: pmap_enter: no pv entries available
syncing disks... 

with no opportunity to drop to ddb or anything.

(Without the "options NKMEMPAGES=150000" tweak, the machine is basically
useless to me -- I can easily run "too much stuff" that gets me to the 
"no pv entries available" panic :( )

The proper fix is to fix the pmap code.  A short-term workaround is to 
do the NKMEMPAGES bump above...

Later...

Greg Oster