tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Ubiquitous ucas(9)




> On Mar 29, 2019, at 6:16 PM, Jason Thorpe <thorpej%me.com@localhost> wrote:
> 
> Thanks for the pointer -- I'll take a look.  The splhigh() should not be a problem because the page will be wired.
> 
> So, something like:
> 
> 	-- Broadcast IPI
> 	-- Wait for semaphore variable to reach (ncpuonline - 1)
> 	-- do software CAS
> 	-- set semaphore variable to 0 to release the spinners
> 
> Seem reasonable?

Ok, to follow up on this, I've implemented my ucas proposal on the GitHub branch I'm using to for all of the ufetch / ustore stuff, and augmented my ATF unit tests to exercise ucas, at least at a basic level.

Here is my final proposed solution:

-- For platforms that can do so, __HAVE_UCAS_FULL is the preferred way -- it's the most-efficient.  x86, alpha, sparc64, powerpc, and a few others fall into this category.  Maybe aarch64, too, but it didn't have a ucas implementation before, so it gets to use my generic version for now.

-- For platforms that want a generic UP implementation by can do an optimized MP implementation, __HAVE_UCAS_MP is available.  I initially had MIPS in mind for this, but MIPS already had a very complex solution for this problem that necessitated the use of __HAVE_UCAS_FULL, and I wanted to reduce code code churn.  I ended up using it on 32-bit ARM ... UP systems get the generic implementation, but ARMv6 and later have a LL/SC implementation available, and that is what is used on multiprocessors.

-- For everything else (uniprocessor systems, as well as test-and-set multiprocessors like sparc and vax), there is a generic implementation that is provided to handle all cases.  The multiprocessor critical section works like this:

a) acquire the ucas_critical_mutex.
b) disable preemption.
c) go to splhigh().
d) set ucas_critical_owning_cpu to curcpu().
e) membar_enter()
f) Trigger an IPI on all-but-owning-cpu to enter the ucas_critical_gate.
g) wait for all-but-owning cpu to enter the gate.

Then we perform the ufetch-compare-maybe-ustore.  Then:

h) membar_exit()
i) set ucas_critical_owning_cpu to NULL to release the other CPUs from the gate.
j) splx(s)
k) re-enable preemption
l) release ucas_critical_mutex.

The target user address is wired before we enter the critical section to time-bound it.

I built an x86_64 kernel with the generic implementation to test it, and it works great.  (I included some extra instrumentation so I could verify that it was actually behaving correctly and I wasn't just getting lucky).

For all of the uniprocessor systems that provided a RAS implementation of ucas, I garbage-collected those because I think it's better to pay a slightly higher cost when ucas is actually used, rather than the cost of a specific check for an uncommon RAS on every trap.

I'm going to write up some implementation notes in the ucas(9) man page so that port maintainers have some guidance about what to do when adding a new platform (since we seem to have a few of those in-flight at the moment, yay!)

-- thorpej



Home | Main Index | Thread Index | Old Index