[concurrency-interest] Enforcing total sync order on modern hardware

Marko Topolnik marko at hazelcast.com
Tue Mar 17 11:57:39 EDT 2015


On Tue, Mar 17, 2015 at 3:56 PM, Aleksey Shipilev <
aleksey.shipilev at oracle.com> wrote:
>
>
> I offered the simplistic mental model where fences expose the values to
> global coherency in instruction order, and thus regain the total order.
> When you say "wouldn't necessarily happen on a unique global timescale"
> or "due to distributed nature of coherence", you seem to be implying
> some other model, but that definition is too vague to be useful. Indeed,
> I can come up with an arbitrarily weak model that allows pretty much
> anything.


I am trying to work my way starting from specified guarantees, taking care
not to introduce tacit assumptions such as the transitivity of separately
guaranteed total orders. But yes, at some point empirical evidence takes
over the documented guarantees.

> These intricate details on hardware concurrency is one of the reasons we
> have JMM. JMM explicitly requires the total order on synchronization
> actions, and JVM implementors are responsible to figure out how to map
> it back on hardware, let it be fences or locked instructions or what.

Sure, that's exactly what I was asking about: how does JMM efficiently
translate into hardware given the ever more distributed nature of the HW
architecture.

> > Also, for the newer revision of Intel's specification, “P6. In a
> > multiprocessor system, stores to the same location have a total order”
> > has been replaced by: “Any two stores are seen in a consistent order by
> > processors other than those performing the stores.”
>
> I think that provision is to allow store forwarding, as I described in
> my original reply with Dekker idiom. MFENCE should bring the total store
> order guarantees back even for the "writer" processors

Note that the older phrase talked only about the _same_ location, therefore
not applying to IRIW.

> > So here's a consistent order seen by all the processors except those
> > running the two writing threads:
> >
> > Wv0 -> T3 -> T6 -> T9 -> Wv1
> >
> > This also respects the total ordering for each individual site, and a
> > total ordering of each individual processor's stores. The "reading"
> > thread inserts its Rv0 between T9 and Wv1.
>
> I don't understand this example. If Wv1 is serialized with fence, and
> there is a read from "t" following it, that read cannot be Rwt6 then, as
> it should see t=9. That pretty much deconstructs the original "execution".

I didn't see the need to imply global serialization: I maintained the order
of writes by each thread and the total order of actions on a single
location. Only a global ordering between reads of one location and writes
to another location forces Wv1 to be sandwiched between T3 and T6.
Apparently, though, MFENCE does force it.

-Marko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20150317/2e7d2950/attachment.html>


More information about the Concurrency-interest mailing list