tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Ubiquitous ucas(9)
Jason Thorpe <thorpej%me.com@localhost> wrote:
> Folks...
>
> I'm trying to wrap up my ufetch / ustore changes
> (http://mail-index.netbsd.org/tech-kern/2019/02/23/msg024690.html), and I
> realized there's another requirement that the project I'm working on
> has... specifically, ucas_int() ... for those unfamiliar, this is like
> atomic_cas_int(), but for kernel accessing user space.
>
> There are several classes of ucas implementations:
>
> -- Platforms that have a CAS primitive. There are already some
> implementations like this in the tree (e.g. x86).
I think I might have introduced the ucas_*() API with x86 support back
in 2009 (and portmasters added it to others architectures, but not all)..
The ustore/ufetch API was supposed to be a follow-up clean up. Thanks
for finally finishing up the clean up of all these APIs. :)
>
> I'm thinking along these lines:
>
> -- The primitives that have to be implemented are _ucas_32 (and _ucas_64,
> only on _LP64 platforms). All of the other type variants are strong
> aliases of these two symbols (this is what I did with ufetch / ustore).
I think you should also add ucas_ptr() which would alias to 32 or 64,
depending on the architecture.
>
> -- Uniprocessor-only platforms (SuperH, m68k, etc.) can fall back on a MI
> _ucas_32 / _ucas_64 implementation that does this:
>
> 1- uvm_vslock() the target address.
> 2- disable preemption
> 3- ufetch, compare, maybe-ustore
> 4- reenable preemption
> 5- uvm_vsunlock()
>
That seems fine.
> -- Platforms that want to choose at run-time between a specialized MP
> version of _ucas_32 / _ucas_64 and the generic UP version can define
> __HAVE_UCAS_MP, provide _ucas_32_mp and _ucas_64_mp functions, and the MI
> code will choose at run-time.
I think we should strive to avoid such MD defines as they just cause #ifdef
mess in the MI code. Keep it simple and just provide the MP versions for
SMP supporting architectures. If you really want to optimise a particular
architecture, then a better approach is to implement run-time patching (see
sys/arch/x86/x86/patch.c, sys/arch/powerpc/powerpc/fixup.c, etc). At some
point I was contemplating a MI API to abstract the run-time patching which
would be very useful for various primitives, including ucas.. but that is
a separate topic.
>
> Then there's the problem of sparc and vax.
>
> Here's what I'm thinking for those cases. Basically, a version that does
> the same as the generic UP implementation, with a couple of extra steps:
>
> 1- uvm_vslock() the target address
> 2- disable preemption
> 3- xcall to other CPUs to get them to pause.
> 4- ufetch, compare, maybe-ustore
> 5- xcall to other CPUs to release them
> 6- reenable preemption
> 7- uvm_vsunlock()
>
If you mean xcall(9), then two xcalls would not quite work. You can do
one xc_broadcast()+xc_wait() with double-synchronisation around condvar(9)
though. A faster implementation would be to broadcast IPIs (there is a MI
ipi_trigger_multi(9) function) and spin around a variable (think of a
pthread_barrier_wait()-like mechanism, just with two waiting paths). In
the above logic, you would need to splhigh() instead of just disabling the
preemption. It would still be quite expensive operation, but I think it's
fair for the problematic architectures to take an extra hit. Note that on
UP platforms the IPI broadcast will essentially result in a no-operation,
so the performance difference between your UP version and the described MP
version would be very insignificant there (and hence my above point).
--
Mindaugas
Home |
Main Index |
Thread Index |
Old Index