tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: What's an "MPSAFE" driver need to do?



On 28 Feb, 2013, at 17:55 , Mouse <mouse%Rodents-Montreal.ORG@localhost> wrote:
>>> - membar_consumer is described with "All loads preceding the memory
>>>   barrier will complete before any loads after the memory barrier
>>>   complete".  That last "complete" needs to be "start" for this to
>>>   be a useful guarantee.
>> I don't know about this,
> 
> Well, consider:
> 
> /* must get datum_1 value before reading datum_2 */
> i = datum_1;
> membar_consumer();
> j = datum_2;
> 
> With the wording as it stands, this could turn into
> 
> On CPU 1:
>       ask memory subsystem to load datum_1 into i
>       ask memory subsystem to load datum_2 into j
>       memory subsystem reads datum_2
> On CPU 2:
>       write to datum_1 and datum_2
> On CPU 1:
>       memory subsystem reads datum_1
>       barrier causes: wait for memory subsystem to write i
>       wait for memory subsystem to write j

Sure, this would be screwed up, though whatever the barrier does should be
done between "ask memory..." and "ask memory..." instead of where you
have it.  The problem is that the wording you want would seem to prohibit:

On CPU 1:
        ask memory subsystem to load datum_1 into i
                datum_1 is not in cache; read to main memory takes a long time 
to complete
        barrier; ask memory subsystem to make sure, if some CPU changed the 
value above
                recently, that subsequent things you read reflect any older 
changes by the
                same CPU
        ask memory subsystem to load datum_2 into j
                datum_2 is in cache (wouldn't be if it had been written 
recently); can be fetched right away
        read of datum_2 completes; processor free to execute instructions 
dependent on j
        read of datum_1 completes; processor free to execute instructions 
dependent on i

On CPU 2:
        nothing writes datum_1 or datum_2

The CPU hardware is perfectly free to complete the load after the barrier
before the load before the barrier has gotten much beyond the just-starting
stage as long as its cache coherency protocol can prove that this doesn't
matter (i.e. that nothing has been going on recently on any other CPU that
would make it matter).  The read barrier doesn't wait for anything to happen,
it just tells the memory subsystem that if something is going on with these
variables you need to see values consistent with the order of the writes by
any single CPU.  I think the issue of when something might actually "start"
or "complete" isn't very strongly related to the thing the barrier ensures,
which is why I'm not sure the wording matters very much.


>>    (my_i == 0 && my_j == 1)
> 
> Yes.  That's what I believe it's intended to do, and that's what I
> think the current wordings don't promise.

Maybe.  I think your wording suggests the CPU will do something
that isn't necessary for it to do to guarantee that result.

>> How would this be better described?
> 
> Apply all applicable fixes from the foregoing:
> 
> membar_sync
>       All loads and stores preceding the memory barrier will complete
>       and reach global visibility, respectively, before any loads and
>       stores after the memory barrier start and reach global
>       visibility, respectively.

That works for me as long as you don't actually expect the CPU to
complete the loads before the barrier before the loads after
the barrier start.

Dennis Ferguson


Home | Main Index | Thread Index | Old Index