pools and scarcity

To: tech-kern%NetBSD.org@localhost
Subject: pools and scarcity
From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Date: Sun, 9 Dec 2012 18:16:39 +0000

When pool_get can't service the request from any pooled storage
because it's all in use, it calls out to the pool's allocator.
Sometimes the pool's allocator can't service the request, e.g. because
it has to come from kva and kva is currently too scarce or fragmented
to find space.  In this case, it may hang indefinitely -- even if, a
moment later, someone returns an object to the pooled storage with
pool_put.

I believe this is the cause of the vm_map monster I have found lurking
in my i386 systems, chronicled in PR kern/45718.  I would like to slay
this monster, but it's not clear to me how to proceed.  Here are
several ideas:

1. We could work around this problem for the case of exec_pool --
where all my processes are waiting in vm_map -- by setting a low water
mark equal to the pool's hard limit, in order to get the contiguous
chunks of kva we need up front before kva gets too fragmented, but
that might waste a lot of space, and wouldn't avoid the more general
problem.

2. We could break up execargs into little pieces so that we don't need
a large, contiguous chunk of kva.  That would add a lot of bookkeeping
to exec, which is already a maze of twisty bookkeeping, all alike, and
I'm not sure that exec is the only thing that needs large, contiguous
chunks of kva.

3. We could keep track of the number of pending pool_grows, and use
the pool's hard limit to bound both the number of pages allocated and
the number of pending pool_grows.  Then if someone pool_puts the last
object in a page, and there are pending pool_grows, we can free the
page even if that would put us below the low water mark, because
there's a pending pool_grow about to bring us back up again.  However,
there is, of course, a race here -- someone else might nab the (say)
kva before exec_pool_alloc can get at it, and now we have lost one of
the precious execargs structures from the pool.

4. We could implement an abstraction for multiplexed waits with
negative acknowledgements, like Concurrent ML, in order to let a
caller wait either for someone to free kva or for someone to pool_put,
by passing a token down into the pool allocator and -- in the exec
example -- into the cv_timedwait inside uvm_km_alloc, so that we can
bail out of that (and clear the UVM_MAP_WANTVA flag) if someone does a
pool_put and we decide we'd rather use that than get new kva.

The last option, of course, is rather crazy, and unlikely to happen
for various reasons, but it's not clear to me how else to guarantee
that we can make progress after pool_put as we ought to be able to.

Thoughts?

Prev by Date: DHCP addressing (was: Broadcast traffic on vlans leaks into the parent interface on NetBSD-5.1)
Next by Date: fxp link toggles with multicast filters
Previous by Thread: Cleanup bread/breadn API
Next by Thread: fxp link toggles with multicast filters
Indexes:

Home | Main Index | Thread Index | Old Index