tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Ubiquitous ucas(9)



Jason Thorpe <thorpej%me.com@localhost> wrote:
> Folks...
> 
> I'm trying to wrap up my ufetch / ustore changes
> (http://mail-index.netbsd.org/tech-kern/2019/02/23/msg024690.html), and I
> realized there's another requirement that the project I'm working on
> has... specifically, ucas_int() ... for those unfamiliar, this is like
> atomic_cas_int(), but for kernel accessing user space.
> 
> There are several classes of ucas implementations:
> 
> -- Platforms that have a CAS primitive.  There are already some
> implementations like this in the tree (e.g. x86).

I think I might have introduced the ucas_*() API with x86 support back
in 2009 (and portmasters added it to others architectures, but not all)..
The ustore/ufetch API was supposed to be a follow-up clean up.  Thanks
for finally finishing up the clean up of all these APIs. :)

> 
> I'm thinking along these lines:
> 
> -- The primitives that have to be implemented are _ucas_32 (and _ucas_64,
> only on _LP64 platforms).  All of the other type variants are strong
> aliases of these two symbols (this is what I did with ufetch / ustore).

I think you should also add ucas_ptr() which would alias to 32 or 64,
depending on the architecture.

> 
> -- Uniprocessor-only platforms (SuperH, m68k, etc.) can fall back on a MI
> _ucas_32 / _ucas_64 implementation that does this:
> 
> 	1- uvm_vslock() the target address.
> 	2- disable preemption
> 	3- ufetch, compare, maybe-ustore
> 	4- reenable preemption
> 	5- uvm_vsunlock()
> 

That seems fine.

> -- Platforms that want to choose at run-time between a specialized MP
> version of _ucas_32 / _ucas_64 and the generic UP version can define
> __HAVE_UCAS_MP, provide _ucas_32_mp and _ucas_64_mp functions, and the MI
> code will choose at run-time.

I think we should strive to avoid such MD defines as they just cause #ifdef
mess in the MI code.  Keep it simple and just provide the MP versions for
SMP supporting architectures.  If you really want to optimise a particular
architecture, then a better approach is to implement run-time patching (see
sys/arch/x86/x86/patch.c, sys/arch/powerpc/powerpc/fixup.c, etc).  At some
point I was contemplating a MI API to abstract the run-time patching which
would be very useful for various primitives, including ucas.. but that is
a separate topic.

> 
> Then there's the problem of sparc and vax.
> 
> Here's what I'm thinking for those cases.  Basically, a version that does
> the same as the generic UP implementation, with a couple of extra steps:
> 
> 	1- uvm_vslock() the target address
> 	2- disable preemption
> 	3- xcall to other CPUs to get them to pause.
> 	4- ufetch, compare, maybe-ustore
> 	5- xcall to other CPUs to release them
> 	6- reenable preemption
> 	7- uvm_vsunlock()
> 

If you mean xcall(9), then two xcalls would not quite work.  You can do
one xc_broadcast()+xc_wait() with double-synchronisation around condvar(9)
though.  A faster implementation would be to broadcast IPIs (there is a MI
ipi_trigger_multi(9) function) and spin around a variable (think of a
pthread_barrier_wait()-like mechanism, just with two waiting paths).  In
the above logic, you would need to splhigh() instead of just disabling the
preemption.  It would still be quite expensive operation, but I think it's
fair for the problematic architectures to take an extra hit.  Note that on
UP platforms the IPI broadcast will essentially result in a no-operation,
so the performance difference between your UP version and the described MP
version would be very insignificant there (and hence my above point).

-- 
Mindaugas


Home | Main Index | Thread Index | Old Index