Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Fixing pool_cache_invalidate(9), final step



On Tue, 29 May 2012 23:29:10 +0200, Lars Heidieker wrote:
On Tue, May 29, 2012 at 9:03 PM, Jean-Yves Migeon <jym%netbsd.org@localhost> wrote:
-current: I would like to take a more radical step and perform the
xcall(9) directly from pool_cache_invalidate(9). Given that xcall(9)
broadcasts cannot be sent from interrupt or softint context (because of the use of condvars), this change is likely to expose bugs in code paths
where invalidation is handled from interrupt context (which is a bad
thing IMHO). See current-pcache-invalidate.diff.

during looking at the patch (current version one) I came across one
pitfall (if I'm not mistaken) pool_cache_invalidate_cpu will release
the per cpu pool_cache_cpu_t struct which only should happen if the
cache is about to be destroyed.

Correct! And curiously this never lead to a hard failure on my side, any get/put to the pool should make it crash very early (when accessing per-CPU caches).

That confirms that my current testing is limited. Hmm.

the problem lies in that pool_cache_invalidate_groups releases
pool_cache_groups during normal pool_cache_drain activity but
pool_cache_invalidate_cpu is intended for getting the pool_cache ready
for destruction.
calling pool_cache_invalidate_cpu via xcall from pool_drain =>
pool_cache_invalidate is problematic.

Indeed. The issue lies in the last part of pool_cache_invalidate_cpu() though, not putting back the pool_cache_cpu back into the available pool could fix this. Perhaps this could be split in two different routines then?

pool_cache_cpu_destroy(...) {
     pool_cache_invalidate_cpu(...);

     if (cc != &pc->pc_cpu0)
         pool_put(&cache_cpu_pool, cc);;
}

A bit like pool_cache_destroy()?

This should be done for -6 and -current though, this reason is valid in both situations.

for the pool_cache destruction case we have to make sure that any
pool_cache_invalidate with it's pool_cache_invalidate_groug calls and
probably xcall that do the cpu local cache to pool_cache global
transfer have finished before pool_cache_invalidate_cpu is called.
if pool_cache_xcall is called from pool_drain the we could rejoin
pool_drain_start and pool_drain_end....

The synchronous pool_cache_invalidate_<whatever> would fit such a goal, no?

if my grepping is right pool_reclaim/pool_cache_reclaim are only user
in subr_pool.c (for _HARDKERNEL) so draining should indeed only happen
from the pageout daemon and not from interrupt context at all, so we
should (if I'm not mistaken) be able to simplify be integrating
pool_drain_start/end and calling the pool_cache_xcall from
pool_cache_invalidate...
currently with the split pool_drain_start/end the time the xcall runs
might on SMP be used to drain the buffer pool, the only different that
comes to my mind quickly.

I came to the same conclusion while looking at the drain_start/_end functions, but they were outside the scope of my patch (I expected oppositions, so kept it as small as possible).

While we are at it, I would like to make the routine asynchronous for
-current, ie. it broadcasts the invalidation but do not wait for all
CPUs to finish executing the xcall(9). Reason behind is that each CPU is responsible for the invalidation of its objects, and should not try to
feed them back to the global pool before the invalidation caller may
finally invalidate them globally. Waste of time, especially as these
objects can be freed safely (ctor/dtor shall implement their own
internal locking, see pool_cache(9)).

we might do this async I think but we need it synced for the
pool_cache destruction case...

Agreed.

Thanks for the review Lars, highly appreciated.

--
Jean-Yves Migeon
jeanyves.migeon%free.fr@localhost



Home | Main Index | Thread Index | Old Index