more on the "vmmapva" kernel deadlocks on NetBSD-4

To: NetBSD/i386 Discussion List <port-i386%NetBSD.org@localhost>
Subject: more on the "vmmapva" kernel deadlocks on NetBSD-4
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Thu, 18 Jun 2009 05:52:30 -0400

Some of you may remember my PR#s 38019 and 38246.

For a long time the problem seemed to be fixed, ever since I started
running a kernel built from the "wrstuden-fixsa" branch off of
"netbsd-4".

Recently though some upgrades of my server I've managed to briefly run a
netbsd-4 kernel built from the head of the branch, i.e. ever since the
commit done for ticket #1196.

Unfortunately the problems described in the PRs mentioned above
reappeared immediately and with a vengance -- even a few runs of "cvs
update" of the NetBSD source tree could trigger it shortly after a fresh
boot.

In fact it had been so long since I built the last wrstuden-fixsa kernel
that I didn't remember doing so and then after seeing this problem again
I was almost believing the wrstuden-fixsa changes had not been merged to
netbsd-4.

It seemed as though the original wrstuden-fixsa changes were working
very well, but somehow now since they were merged to netbsd-4, some
regression may have occurred.

Perhaps somehow related I would like to note that briefly I also played
a bit more with tuning to try to fix the problem, managing only to make
it worse when I increased vm.nkmempages from 32768 to 65536.  With this
change the problem occurs with as little as one run of "cvs update".

So, I'm wondering if anything was possibly missed in the pullup done for
ticket #1196.

Also I'm wondering if anyone else has experienced any strange lockups on
large-memory servers with recent (since 2008/09/16) netbsd-4 kernels on
i386?  Particularly of interest are lockups where DDB shows processes
sitting in "vmmapva".

Can these "vmmapva" lockups even be related in any way to the fixsa
changes?

I've done a diff of my relevant local source trees but I can't really
see anything critical (my current netbsd-4 tree includes new local
changes made since I built the working wrstuden-fixsa kernel, plus of
course all pullups done to the netbsd-4 branch since wrstuden-fixsa was
merged).

The only major differences I note are in some kernel tuning parameters,
which I suppose may partly be responsible for exhausting more KVA
space.  I'm going to try reducing BUFCACHE to 10% from the default 15%
to see if that makes room for whatever else changed.

I'll also reduce NKMEMPAGES_MAX back down to 128MB worth (I'm gussing
the kmem_map it controls the size of is in KVA space and so squeezes
things even more).

Interestingly the older kernel I was running successfully I believe has
bufcache set at the default 15% as sysctl was showing that indeed nearly
15% of RAM matches the vm.bufmem_hiwater value.

BTW, I did try setting 'options KERNBASE="0x80000000UL"', but that
simply results in an instant reboot of my test machine right after the
kernel was loaded, so somehow that doesn't work any more.

Is there any simple way I could write a kernel thread or similar to
watch for KVA space exhaustion and at least report it before things
grind too much to a halt (kernel console output still works in this
state at least)?

--
                                                Greg A. Woods
                                                Planix, Inc.

<woods%planix.com@localhost>       +1 416 218-0099        http://www.planix.com/

Attachment: pgpjhOZHOUQuj.pgp
Description: PGP signature

Prev by Date: Re: [libc] dlclose gives "invalid shared object handle" without pthread combination.
Next by Date: strange RF_PROTECTED_SECTORS magic
Previous by Thread: Problems getting network working
Next by Thread: strange RF_PROTECTED_SECTORS magic
Indexes:

Home | Main Index | Thread Index | Old Index