tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Ubiquitous ucas(9)




> On Mar 29, 2019, at 3:01 PM, Mindaugas Rasiukevicius <rmind%netbsd.org@localhost> wrote:

> I think I might have introduced the ucas_*() API with x86 support back
> in 2009 (and portmasters added it to others architectures, but not all)..
> The ustore/ufetch API was supposed to be a follow-up clean up.  Thanks
> for finally finishing up the clean up of all these APIs. :)

I'm here to help :-)

> I think you should also add ucas_ptr() which would alias to 32 or 64,
> depending on the architecture.

Yah, I have that... should have mentioned it, but thanks for making sure.

>> -- Platforms that want to choose at run-time between a specialized MP
>> version of _ucas_32 / _ucas_64 and the generic UP version can define
>> __HAVE_UCAS_MP, provide _ucas_32_mp and _ucas_64_mp functions, and the MI
>> code will choose at run-time.
> 
> I think we should strive to avoid such MD defines as they just cause #ifdef
> mess in the MI code.  Keep it simple and just provide the MP versions for
> SMP supporting architectures.  If you really want to optimise a particular
> architecture, then a better approach is to implement run-time patching (see
> sys/arch/x86/x86/patch.c, sys/arch/powerpc/powerpc/fixup.c, etc).  At some
> point I was contemplating a MI API to abstract the run-time patching which
> would be very useful for various primitives, including ucas.. but that is
> a separate topic.

The complication is MIPS (and possibly ARM, but to a lesser extent).  You have have a single GENERIC kernel that runs on UP or MP systems, for some of our MIPS platforms, and need to make a run-time decision.  Some of the UP-only MIPS systems don't have LL/SC, so they can't use the "full" variant like x86 or alpha can.  And not all of these platforms have the hot-patch support... adding that seems beyond the scope, so I'd rather save that for a future iteration.

The structure I have coded up doesn't make for #ifdef spaghetti...

> If you mean xcall(9), then two xcalls would not quite work.  You can do
> one xc_broadcast()+xc_wait() with double-synchronisation around condvar(9)
> though.  A faster implementation would be to broadcast IPIs (there is a MI
> ipi_trigger_multi(9) function) and spin around a variable (think of a
> pthread_barrier_wait()-like mechanism, just with two waiting paths).  In
> the above logic, you would need to splhigh() instead of just disabling the
> preemption.  It would still be quite expensive operation, but I think it's
> fair for the problematic architectures to take an extra hit.  Note that on
> UP platforms the IPI broadcast will essentially result in a no-operation,
> so the performance difference between your UP version and the described MP
> version would be very insignificant there (and hence my above point).

Thanks for the pointer -- I'll take a look.  The splhigh() should not be a problem because the page will be wired.

So, something like:

	-- Broadcast IPI
	-- Wait for semaphore variable to reach (ncpuonline - 1)
	-- do software CAS
	-- set semaphore variable to 0 to release the spinners

Seem reasonable?

-- thorpej



Home | Main Index | Thread Index | Old Index