tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Ubiquitous ucas(9)
> On Mar 29, 2019, at 3:01 PM, Mindaugas Rasiukevicius <rmind%netbsd.org@localhost> wrote:
> I think I might have introduced the ucas_*() API with x86 support back
> in 2009 (and portmasters added it to others architectures, but not all)..
> The ustore/ufetch API was supposed to be a follow-up clean up. Thanks
> for finally finishing up the clean up of all these APIs. :)
I'm here to help :-)
> I think you should also add ucas_ptr() which would alias to 32 or 64,
> depending on the architecture.
Yah, I have that... should have mentioned it, but thanks for making sure.
>> -- Platforms that want to choose at run-time between a specialized MP
>> version of _ucas_32 / _ucas_64 and the generic UP version can define
>> __HAVE_UCAS_MP, provide _ucas_32_mp and _ucas_64_mp functions, and the MI
>> code will choose at run-time.
>
> I think we should strive to avoid such MD defines as they just cause #ifdef
> mess in the MI code. Keep it simple and just provide the MP versions for
> SMP supporting architectures. If you really want to optimise a particular
> architecture, then a better approach is to implement run-time patching (see
> sys/arch/x86/x86/patch.c, sys/arch/powerpc/powerpc/fixup.c, etc). At some
> point I was contemplating a MI API to abstract the run-time patching which
> would be very useful for various primitives, including ucas.. but that is
> a separate topic.
The complication is MIPS (and possibly ARM, but to a lesser extent). You have have a single GENERIC kernel that runs on UP or MP systems, for some of our MIPS platforms, and need to make a run-time decision. Some of the UP-only MIPS systems don't have LL/SC, so they can't use the "full" variant like x86 or alpha can. And not all of these platforms have the hot-patch support... adding that seems beyond the scope, so I'd rather save that for a future iteration.
The structure I have coded up doesn't make for #ifdef spaghetti...
> If you mean xcall(9), then two xcalls would not quite work. You can do
> one xc_broadcast()+xc_wait() with double-synchronisation around condvar(9)
> though. A faster implementation would be to broadcast IPIs (there is a MI
> ipi_trigger_multi(9) function) and spin around a variable (think of a
> pthread_barrier_wait()-like mechanism, just with two waiting paths). In
> the above logic, you would need to splhigh() instead of just disabling the
> preemption. It would still be quite expensive operation, but I think it's
> fair for the problematic architectures to take an extra hit. Note that on
> UP platforms the IPI broadcast will essentially result in a no-operation,
> so the performance difference between your UP version and the described MP
> version would be very insignificant there (and hence my above point).
Thanks for the pointer -- I'll take a look. The splhigh() should not be a problem because the page will be wired.
So, something like:
-- Broadcast IPI
-- Wait for semaphore variable to reach (ncpuonline - 1)
-- do software CAS
-- set semaphore variable to 0 to release the spinners
Seem reasonable?
-- thorpej
Home |
Main Index |
Thread Index |
Old Index