tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: percpu_foreach() does not execute remotely
> On Jan 28, 2020, at 8:40 AM, Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>
> Why not just use xc_broadcast? Does a cross-calling version of
> percpu_foreach really help simplify much or clarify anything?
I suppose I could, but:
1- This seems to be a general issue with percpu_foreach(). I mean, one of the main purposes of per-cpu storage is that other CPUs don't touch it, and percpu_foreach() completely violates the POLA in this regard.
2- If I were to use xc_broadcast(), then in the case of network protocol / interface stats, I would need to use either atomic operations or add some other synchronization into the shared "exported copy" of the data that's being gathered up.
For network stats, what I want is:
1- Tally all volatile stats on a per-cpu basis to avoid sharing / contention. (Already done for protocols, in-progress for network interfaces.)
2- When the stats are to be reported to user land (rare operation), safely gather all of the per-cpu counters into an "export" copy of the buffer. (Already done for protocols, in-progress for network interfaces.)
The problem with (2) is that percpu_foreach() does not execute on the CPU that owns the data, meaning that the data I'm fetching may be wrong ... this is not of particular concern on a 64-bit platform, because, while write tearing could occur, it's pretty unlikely. But it could lead to totally wrong values being exported on a 32-bit platform. Sure, this is not CRITICAL, but it's also not CORRECT.
With network stats, I want the percpu_foreach callback to be able to synchronize against IPL_SOFTNET (for protocols) or IPL_NET (for interfaces) interrupts to ensure consistency of the value being read from the cpu-local counters, and I want to be able to add those cpu-local values into the "export" buffer without having to use atomic operations (not possible on 32-bit platforms) or use additional synchronization (like a mutex) (the "foreach" part of the name implies a serial enumeration, rather than parallel operation).
> I count at most two lines of code that it would save (and actually
> only one of them is technically needed because you could safely use
> percpu_getptr_remote in this context):
>
> xc_wait(xc_broadcast(0, foo_xc, percpu, NULL));
>
> foo_xc(void *vpercpu, void *vcookie)
> {
> struct percpu *percpu = vpercpu;
> struct foo_cpu *foo = percpu_getref(percpu); // ---
>
> ...
> percpu_putref(percpu); // ---
> }
-- thorpej
Home |
Main Index |
Thread Index |
Old Index