[concurrency-interest] Enforcing total sync order on modern hardware

Stephan Diestelhorst stephan.diestelhorst at gmail.com
Tue Mar 17 17:42:17 EDT 2015

Am Dienstag, 17. März 2015, 14:39:32 schrieb Marko Topolnik:
> On Tue, Mar 17, 2015 at 11:46 AM, Aleksey Shipilev <
> aleksey.shipilev at oracle.com> wrote:
> > If "sharedVar" is also volatile (sequentially consistent), then Wv1
> > would complete before reading Rwt6.
> OK, but this wouldn't necessarily happen on a unique global timescale: the
> "writing" thread would have the ordering Wv1 -> Rwt6; there would be an
> _independent_ total order of actions on currentTime, and a third, again
> independent order of actions by the "reading" thread. Due to the
> distributed nature of coherence the fact that, on one core, Wv1 precedes
> Rwt6 does not enforce Rrt6 -> Rv1 on another core. It is not obvious that
> there is transitivity between these individual orders.
> Particularly note this statement in
> http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf:
> "[the CPU vendor specifications] admit the IRIW behaviour above but, under
> reasonable assumptions on the strongest x86 memory barrier, MFENCE, adding
> MFENCEs would not suffice to recover sequential consistency (instead, one
> would have to make liberal use of x86 LOCK’d instructions). Here the
> specifications seem to be much looser than the behaviour of implemented
> processors: to the best of our knowledge, and following some testing, IRIW
> is not observable in practice, even without MFENCEs. It appears that some
> JVM implementations depend on this fact, and would not be correct if one
> assumed only the IWP/AMD3.14/x86-CC architecture."
> Also, for the newer revision of Intel's specification, “P6. In a
> multiprocessor system, stores to the same location have a total order” has
> been replaced by: “Any two stores are seen in a consistent order by
> processors other than those performing the stores.”

IRIW is ruled out by both AMD and Intel.  Not a problem.
Weak architectures, such as ARM, talk about multi-copy atomicity, which
is what you are after.  On these architectures, fences (DMBs in the ARM
case) do restore global order through an elaborate construction of who
saw what etc.

Nothing to see here, I presume (unless you were talking about a real
closk, such as the TSC on x86.  But that has interesting semantics and I
will not feed the trolls.  Some naive notes:
http://rp-www.cs.usyd.edu.au/~gramoli/events/wttm4/papers/diestelhorst.pdf )


More information about the Concurrency-interest mailing list