[concurrency-interest] Enforcing total sync order on modern hardware

Marko Topolnik marko at hazelcast.com
Tue Mar 17 09:39:32 EDT 2015


On Tue, Mar 17, 2015 at 11:46 AM, Aleksey Shipilev <
aleksey.shipilev at oracle.com> wrote:

> On 17.03.2015 9:31, Marko Topolnik wrote:
> > There is another concern that may be interesting to reconsider. Given
> > the lack of total sync order when just using memory barriers, is the
> > JSR-133 Cookbook wrong/outdated in this respect? It doesn't at all deal
> > with the issue of the sync order, just with the visibility of
> > inter-thread actions.
>
> The mental model I am having in my head is as follows:
>
>   a) Cache-coherent systems maintain the consistent (coherent) view of
> each memory location at any given moment. In fact, most coherency
> protocols provide the total order for the operations on a *single*
> location. Regardless how the actual interconnect is operating, the cache
> coherency protocols are to maintain that illusion. MESI-like protocols
> are by nature message-based, and so they do not require shared bus to
> begin with, so no problems with QPI.
>

So let's fix the following total order on currentTime:

T3 -> Rwt3 -> T6 -> Rwt6 -> Rrt6 -> T9 -> Rrt9


> If "sharedVar" is also volatile (sequentially consistent), then Wv1
> would complete before reading Rwt6.


OK, but this wouldn't necessarily happen on a unique global timescale: the
"writing" thread would have the ordering Wv1 -> Rwt6; there would be an
_independent_ total order of actions on currentTime, and a third, again
independent order of actions by the "reading" thread. Due to the
distributed nature of coherence the fact that, on one core, Wv1 precedes
Rwt6 does not enforce Rrt6 -> Rv1 on another core. It is not obvious that
there is transitivity between these individual orders.

Particularly note this statement in
http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf:

"[the CPU vendor specifications] admit the IRIW behaviour above but, under
reasonable assumptions on the strongest x86 memory barrier, MFENCE, adding
MFENCEs would not suffice to recover sequential consistency (instead, one
would have to make liberal use of x86 LOCK’d instructions). Here the
specifications seem to be much looser than the behaviour of implemented
processors: to the best of our knowledge, and following some testing, IRIW
is not observable in practice, even without MFENCEs. It appears that some
JVM implementations depend on this fact, and would not be correct if one
assumed only the IWP/AMD3.14/x86-CC architecture."

Also, for the newer revision of Intel's specification, “P6. In a
multiprocessor system, stores to the same location have a total order” has
been replaced by: “Any two stores are seen in a consistent order by
processors other than those performing the stores.”

So here's a consistent order seen by all the processors except those
running the two writing threads:

Wv0 -> T3 -> T6 -> T9 -> Wv1

This also respects the total ordering for each individual site, and a total
ordering of each individual processor's stores. The "reading" thread
inserts its Rv0 between T9 and Wv1.



> Reading Rwt6 after the write means
> the write is observable near tick 6: it is plausible the clock ticked 6
> before we were writing; it is plausible the clock ticked 6 right after
> we did the write. Which *really* means the write is guaranteed to be
> observable at the *next* tick, T9, since "currentTime" reads/writes are
> totally ordered. Therefore, once the reader thread observed t=9, it
> should also observe the Wv1, rendering Rv0 reading "0" incorrect.
>
>                                 Rrt9 ---> Rv0
>   Wv0 --> Wv1 --> Rwt6           ^
>          .---------^         .---/
>        T6 ---------------> T9
>
>  "global time" -------------------------------->
>
>
> Notice how this relies on the writer thread to observe Rwt6! That's a
> reference frame for you. If writer was to observe Rwt9, you might have
> plausibly inferred the Wv1 may be not visible at Rv0:
>

Thanks, that was precisely my motivation to add Rwt6 :)

---
Marko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20150317/52826738/attachment-0001.html>


More information about the Concurrency-interest mailing list