On Fri, 9 Oct 2009 00:28:56 -0700, Jason Thorpe <thorpej%shagadelic.org@localhost
>
wrote:
On Oct 8, 2009, at 11:52 PM, Jean-Yves Migeon wrote:
Consider the case of an mbuf. It is allocated from a per-CPU pcg on
CPU-A,
and subsequently bounces around the networking stack, through socket
buffers, etc. and is finally freed. But the processor that frees it
could
be CPU-B (for any number of legitimate reasons), thus is goes into
one of
CPU-B's per-CPU pcgs.
In this case, mbuf is used by CPU-B. You will not find it inside CPU-
A pool
cache. pool_cache_invalidate_local() invoked on CPU-A will not touch
it, as
CPU-A pool_cache does not hold any reference to it.
The same situation will happen if you pool_cache_destroy() the pool
cache
from CPU-B. Except that, in this case, CPU-B _will_ destroy objects
managed
by pool_cache CPU-A, while pool_cache_invalidate_local() won't. It is
caller's responsibility to ensure that such situation never occurs.
I need a way to invalidate all pool caches, even those that are
CPU-bound. pool_cache_invalidate() only does that for the global
cache,
as it cannot invalidate CPU caches for CPUs other than its own.
CPU-bound is the wrong term here. Nothing is "bound" to the CPU
except
for
a set of pcgs that cache constructed objects. The objects are merely
kept
in CPU-local caches in order to provide a lockless allocation path
in the
common case. The objects themselves are not bound to any one CPU.
Depends on the way you see it. When an object is put back in a CPU-
local
cache, it is bound to this CPU. No other CPU could allocate this
object
(pool_cache will not allow CPU-A to allocate objects cached in CPU-B
pc_pcpu[]).
I am not interested on the owner of objects when currently in use, but
those that are currently released but not "freed" (those cached in the
pool_cache).
Before invalidate_local(), the only way would be to destroy the
pool_cache entirely, just to release the objects found in pc_cpus[].
This
would cripple down the entire VM system, as it cannot work without
the
L2
shadow page pool.
What do you mean "destroy the pool_cache entirely"?
Invoke pool_cache_destroy() on it.
So you're just working
around a bug in pool_cache_invalidate()? If pool_cache_invalidate
() is
not
also zapping the per-CPU pcgs for that pool_cache, then that is a
bug.
It cannot do so safely. pool_cache_invalidate() can be called
whenever you
want, as the global cache is protected by a mutex.
Per-CPU caches are not, due to lockless allocation. If CPU-A
invalidates
the per-CPU caches of CPU-B while CPU-B is concurrently running a
pool_cache_get(), this could result in unspecified behavior.
The
main intent of pool_cache_invalidate() is to nuke any cached
constructed
copies of an object if the constructed form were to change for some
reason.
If per-CPU cached copies are not included in that, then a subsequent
allocation could return an incorrectly-constructed object, leading to
errant behavior.
- pool_cache(9) does not state that:
"""
Destruct and release all objects in the global cache. Per-CPU
caches will
not be invalidated by this call, meaning that it is still possible to
allocate "stale" items from the cache. If relevant, the user must
check
for this condition when allocating items.
"""
There are subsystems in the kernel that depend on the
pool_cache_invalidate() semantics I describe; see
arch/alpha/alpha/pmap.c:pmap_growkernel() for an example. The L1
PTPs
are
cached in constructed form (with the L1 PTEs for the kernel portion
of
the
address space already initialized). If PTPs are added to the
kernel pmap
in such a way as to require an additional L1 PTE to link them up,
then
already-constructed-but-free L1 PTPs need to be discarded since
they will
be missing part of the kernel's address space.
In the current form, that is correct. pool_cache_invalidate() will not
release _all_ objects, only global cached ones. Stale objects in per-
CPU
pcgs will not be freed, nor updated. In Alpha's pmap_growkernel
case, yes,
you end up desyncing your PTPs.
This is wrong, and needs to be fixed, through
pool_cache_invalidate_local()
for each CPU (or an equivalent way to do it, I am opened to
proposals).
Sigh, really the per-CPU pcgs should not be "lockless", but rather
should
be mutex-proected but "never contended" ... just in case you DO
need to
manipulate them from another CPU (this is how Solaris's kmem_cache
works,
at least it did at one point).
This is more intrusive in pool_cache(9). IMHO, protecting the per-
CPU pcgs
with mutexes that will very rarely have contention (system is
expected to
pass a lot more time allocating from these pcgs than asking for their
invalidation from another CPU...) is overkill.
If there is a lockless ("or contention-free mutex") way to do it,
why not.
But this will alter a performance-critical code path, and should be
thought
out very carefully.