tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Ubiquitous ucas(9)



Folks...

I'm trying to wrap up my ufetch / ustore changes (http://mail-index.netbsd.org/tech-kern/2019/02/23/msg024690.html), and I realized there's another requirement that the project I'm working on has... specifically, ucas_int() ... for those unfamiliar, this is like atomic_cas_int(), but for kernel accessing user space.

There are several classes of ucas implementations:

-- Platforms that have a CAS primitive.  There are already some implementations like this in the tree (e.g. x86).

-- Platforms that have a load-locked / store-conditional primitive.  There are already some implementations like this in the tree (e.g. alpha, powerpc).

-- Machine-dependent uniprocessor implementations.  M68k falls into this category.  It uses a restartable atomic sequence because the m68k CAS can't reach into the other-address-space, but it's not clear that it really needs to do this (does m68k support kernel preemption?)  (Interrupts aren't allowed to reach into userspace like that, so there is no need for a kernel RAS to implement it, unless need to protect against being preempted while running the sequence.)

Then there are platforms where it's just missing (sparc, vax).  These platforms have neither a LL/SC nor a CAS, yet they also have multiprocessor models (e.g. sun4m, VAX 11/782, 11/787, etc.)

...and then there's weirdos (MIPS, which has LL/SC on many-but-not-all implementations).

I'd like to have a grand unified theory for all of this, similar to how I did ufetch / ustore.

I'm thinking along these lines:

-- The primitives that have to be implemented are _ucas_32 (and _ucas_64, only on _LP64 platforms).  All of the other type variants are strong aliases of these two symbols (this is what I did with ufetch / ustore).

-- Platforms that can do it all (x86, alpha, sparc64, aarch64, etc.) -- they define __HAVE_UCAS_FULL, and provide _ucas_32 / _ucas_64 that work in all circumstances.

-- Uniprocessor-only platforms (SuperH, m68k, etc.) can fall back on a MI _ucas_32 / _ucas_64 implementation that does this:

	1- uvm_vslock() the target address.
	2- disable preemption
	3- ufetch, compare, maybe-ustore
	4- reenable preemption
	5- uvm_vsunlock()

-- Platforms that want to choose at run-time between a specialized MP version of _ucas_32 / _ucas_64 and the generic UP version can define __HAVE_UCAS_MP, provide _ucas_32_mp and _ucas_64_mp functions, and the MI code will choose at run-time.

Then there's the problem of sparc and vax.

Here's what I'm thinking for those cases.  Basically, a version that does the same as the generic UP implementation, with a couple of extra steps:

	1- uvm_vslock() the target address
	2- disable preemption
	3- xcall to other CPUs to get them to pause.
	4- ufetch, compare, maybe-ustore
	5- xcall to other CPUs to release them
	6- reenable preemption
	7- uvm_vsunlock()

It's essentially doing in software what the memory bus would be doing in a coherent CAS operation if the hardware only supported it.  The use of ucas() on these systems would be in the slow-path of a contended synchronization operation, so it doesn't seem like a horrible price to pay.

Would love to hear people's thoughts on this.


-- thorpej



Home | Main Index | Thread Index | Old Index