[concurrency-interest] Enforcing total sync order on modern hardware
marko at hazelcast.com
Thu Mar 19 09:21:02 EDT 2015
On Tue, Mar 17, 2015 at 10:42 PM, Stephan Diestelhorst <
stephan.diestelhorst at gmail.com> wrote:
> IRIW is ruled out by both AMD and Intel. Not a problem.
My scenario is not quite IRIW. There's one pure writer, one pure reader,
and one reader-writer. There aren't any two reading processors which
conflict in their observation of the store order. For the reader-writer,
its stores and loads are independent so, without a fence, the visibility of
the store may be postponed and the processor is allowed to observe its own
stores sooner than all others.
It seems that the JVM will emit such a fence after a volatile store, which
will ensure that the store has become visible to everyone before attempting
any further loads. That would cause the Rwt6 load to impose an indirect
happens-before to the Rrt9 load by the reading processor, transitively
forcing Wv1 to happen before Rv0, preventing that load to observe the old
value of 0.
On the other hand, the compiler might be tempted to optimize by eliminating
some of the fences for the case of independent stores and loads. In our
example just relying on TSO would be enough to satisfy happens-before
consistency for actions done by the reading-writing processor. Studying
Aleksey's Nanotrusting the Nanotime we find that doing some navel-gazing
CPU work after a volatile write eliminates the volatile overheads,
indicating that computation was able to continue without waiting for the
store to complete. It's just that the total sync order would be very
difficult to achieve if the computation involved volatile reads (even if
they were independent).
> Weak architectures, such as ARM, talk about multi-copy atomicity, which
> is what you are after. On these architectures, fences (DMBs in the ARM
> case) do restore global order through an elaborate construction of who
> saw what etc.
A naive idea would be that a single global memory location was CASed on
each sync operation, allowing everything to piggyback on the total ordering
of CAS operations. I guess in reality there are smarter tricks :)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest