tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Adding pool_cache_invalidate_local() to pool_cache(9) API

On Oct 8, 2009, at 11:52 PM, Jean-Yves Migeon wrote:

>> Maybe I'm missing something ... how is this ever safe to use?  An object can 
>> be allocated from one CPU's cache and freed to another.
> I hardly see how that would be possible. During pool_get/pool_put, the only 
> per-CPU pool cache that is manipulated is the current_cpu() one. If one's CPU 
> is manipulating the pool_cache of another, bad things will happen anyway, as 
> two CPUs could release at the same time in the same pool_cache. Without 
> locks, this sounds wrong.

Consider the case of an mbuf.  It is allocated from a per-CPU pcg on CPU-A, and 
subsequently bounces around the networking stack, through socket buffers, etc. 
and is finally freed.  But the processor that frees it could be CPU-B (for any 
number of legitimate reasons), thus is goes into one of CPU-B's per-CPU pcgs.

> The routine that is never safe is the pool_cache_invalidate_cpu(), when 
> called for a CPU different from the current one running. But this function is 
> never exposed to the outside world.
>> The per-CPU cache is simply an optimization, and it seems very wrong to 
>> expose this implementation detail to consumers of the API.
> I need a way to invalidate all pool caches, even those that are CPU-bound. 
> pool_cache_invalidate() only does that for the global cache, as it cannot 
> invalidate CPU caches for CPUs other than its own.

CPU-bound is the wrong term here.  Nothing is "bound" to the CPU except for a 
set of pcgs that cache constructed objects.  The objects are merely kept in 
CPU-local caches in order to provide a lockless allocation path in the common 
case.  The objects themselves are not bound to any one CPU.

> Before invalidate_local(), the only way would be to destroy the pool_cache 
> entirely, just to release the objects found in pc_cpus[]. This would cripple 
> down the entire VM system, as it cannot work without the L2 shadow page pool.

What do you mean "destroy the pool_cache entirely"?  So you're just working 
around a bug in pool_cache_invalidate()?  If pool_cache_invalidate() is not 
also zapping the per-CPU pcgs for that pool_cache, then that is a bug.  The 
main intent of pool_cache_invalidate() is to nuke any cached constructed copies 
of an object if the constructed form were to change for some reason.  If 
per-CPU cached copies are not included in that, then a subsequent allocation 
could return an incorrectly-constructed object, leading to errant behavior.  
There are subsystems in the kernel that depend on the pool_cache_invalidate() 
semantics I describe; see arch/alpha/alpha/pmap.c:pmap_growkernel() for an 
example.  The L1 PTPs are cached in constructed form (with the L1 PTEs for the 
kernel portion of the address space already initialized).  If PTPs are added to 
the kernel pmap in such a way as to require an additional L1 PTE to link them 
up, then already-constructed-but-free L1 PTPs need to be discarded since they 
will be missing part of the kernel's address space.

Sigh, really the per-CPU pcgs should not be "lockless", but rather should be 
mutex-proected but "never contended" ... just in case you DO need to manipulate 
them from another CPU (this is how Solaris's kmem_cache works, at least it did 
at one point).

> In Xen's case, pool_cache_invalidate_local() would be used by each CPU during 
> its detach routine (just before its shutdown), to clear all its associated L2 
> pages that Xen cannot handle.

So, we've added a huge abstraction violation to the pool API to handle this 
special case of Xen shutting down?

I'm sorry, but a different solution needs to be found.  Exposing this internal 
implementation detail of pools is just wrong.

-- thorpej

Home | Main Index | Thread Index | Old Index