tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 4.x -> 5.x locking?



On Wed, Nov 09, 2011 at 11:21:43PM -0500, Mouse wrote:
> > However, since we aren't talking about non-cache-coherent
> > architectures (which require even more manual manipulation) it's only
> > about access reordering in the memory hierarchy.
> 
> I'm not totally clear on what cache coherency is.  Based on these
> remarks, I'm going to guess that a cache-coherent architecture is one
> on which, as far as the model visible to the programmer (including
> kernel programmer) goes, it is not possible to have conflicting data in
> two CPUs' caches: either different CPUs don't have distinct caches, or
> there is automatic cache update and/or invalidation in hardware (at
> least optionally, and if it's optional then NetBSD runs the hardware in
> that mode).
> 
> Correct?  If so, that completely annuls the hairiest of my worries.

Basically yes. Once memory writes have got to the L1 cache the
cache coherency logic ensures that any other master (that performs
cache snooping)  will see the data.

It is unusual to try to do SMP programming where the cpu's don't
to cache snooping/coherency, but other hardware (eg ethernet cards)
may not do cache snooping on some architectures so require the
driver to do explicit flushes/invalidates of the data cache.

The only complication is that within the cpu itself there is likely
to be a 'store buffer' which holds data written by the instructon
unit, but not yet written even to the L1 cache. This is used to allow
multiple writes to execute in a single cycles (each).
To increase performance memory reads will typically take predecence
over queued writes. Immediate read backs might either have to wait
for the store buffer to drain, or might be serviced from data in the
buffer (for cached accesses only).
One effect of the store buffer is that things like Dekker's algorythm
just don't work.

Whether reads and writes happen in the order of the instructions
(separately) is also architecture dependant.  IIRC x86 and sparc
maintain the order of writes, but ppc may not.  Of course, a write
that is dependant on the value of a read can't happen before the read!
For cached memory, reads may be speculative - ie done even if the
value isn't actually needed because of a mispredicted branch.

        David

-- 
David Laight: david%l8s.co.uk@localhost


Home | Main Index | Thread Index | Old Index