[concurrency-interest] Enforcing total sync order on modern hardware

Alexander Terekhov TEREKHOV at de.ibm.com
Tue Mar 24 08:30:35 EDT 2015


IRIW is about TSO, not SC (which is TSO + store-load barrier). And TSO is
release-acquire consistency + write atomicity.

> the cornerstone of sequential consistency

is awfully expensive store-load barrier.

PMFJI


Oleksandr Otenko <oleksandr.otenko at oracle.com>@cs.oswego.edu on 24.03.2015
12:42:10

Sent by:	concurrency-interest-bounces at cs.oswego.edu


To:	Marko Topolnik <marko at hazelcast.com>, concurrency-interest
       <Concurrency-interest at cs.oswego.edu>
cc:
Subject:	Re: [concurrency-interest] Enforcing total sync order on modern
       hardware


17.4.3
      A set of actions is sequentially consistent if all actions occur in a
      total order ...(plus details of how the total order relates to
      program order etc)

It looks like your objection goes against this. IRIW works because it must
have the writes observed by all threads in the same order - due to the
writes and reads forming a total order, even if they are independent -
which is the cornerstone of sequential consistency.

I don't know how you measure that the example is minimal, because in some
sense it is also maximal - the minimal case being Dekker's invariant with
two threads writing one and reading the other variable.

T1: x=1;
r0=x; // may fuse with x=1 - then you get the canonical form of the example
r1=y;

T2: y=1;
r2=y; // may fuse with y=1
r3=x;

Alex

On 23/03/2015 17:42, Marko Topolnik wrote:



      On Mar 23, 2015 6:01 PM, "Oleksandr Otenko" <
      oleksandr.otenko at oracle.com> wrote:
      >
      > IRIW results apply to any thread doing the reading. The existence
      of the fourth thread only generalizes the result.


      This is incorrect: the essential property of IRIW is that it
      constructs a result for which no sequentially consistent explanation
      exists. It is the minimal example to reproduce the issue of interest,
      therefore none of its parts is optional.


      >
      > It seems this branch of the conversation is pointless.
      >
      > Alex
      >
      >
      > On 23/03/2015 15:56, Marko Topolnik wrote:
      >>
      >> So your analogy to IRIW is established by introducing a whole new
      reading thread. Such an analogy fails to capture the essence of my
      scenario: I am interested precisely in the case where the
      "reading-writing" thread observes its own writes in addition to the
      "timer" thread's writes. The goal is to analyze the tension between
      the desire to win performance through store forwarding and the need
      to stay sequentially consistent. It was my impression that the
      distributed nature of QPI messaging would result in the processors
      grabbing more of the liberties allowed by Intel's specification,
      which specifically excludes my scenario from the ordering guarantee.
      As Aleksey pointed out, this is not the case because an MFENCE
      instruction provides a stronger guarantee than that: the coherence
      layer will have resolved the value at the stored location before the
      load instruction asks for its value.
      >>
      >> ---
      >> Marko
      >>
      >> On Mon, Mar 23, 2015 at 1:53 PM, Oleksandr Otenko <
      oleksandr.otenko at oracle.com> wrote:
      >>>
      >>> Out of all outcomes IRIW permits, choose those that have the
      fourth thread observe 1 then 0 - ie there exists a thread which
      observed Wv1 occur before T9. Now you are looking at your case with
      three threads. Your case does not add more outcomes, only chooses a
      subset of those in IRIW.
      >>>
      >>>
      >>> Alex
      >>>
      >>>
      >>> On 20/03/2015 22:05, Marko Topolnik wrote:
      >>>>
      >>>> On Fri, Mar 20, 2015 at 7:52 PM, Oleksandr Otenko <
      oleksandr.otenko at oracle.com> wrote:
      >>>>>
      >>>>> On 20/03/2015 18:12, Marko Topolnik wrote:
      >>>>>>
      >>>>>> On Fri, Mar 20, 2015 at 5:45 PM, Oleksandr Otenko <
      oleksandr.otenko at oracle.com> wrote:
      >>>>>>>
      >>>>>>> No, that doesn't answer the question. You need to modify how
      happens-before is built - because happens-before in JMM and in some
      other model are two different happens-befores. If you get rid of
      synchronization order, then you need to explain which reads the write
      will or will not synchronize-with.
      >>>>>>
      >>>>>>
      >>>>>> I think it's quite simple: the read may synchronize-with any
      write as long as that doesn't break happens-before consistency.
      >>>>>
      >>>>>
      >>>>> It seems quite naive, too. The problem is that currently the
      read synchronizes-with all writes preceding it, but observes the
      value set by the last write. Here you need to define somehow which
      write the read observes - you need to somehow define which of the
      writes is "last" and what the other readers are allowed to think
      about it.
      >>>>>
      >>>>> It doesn't seem to be explained in one sentence.
      >>>>
      >>>>
      >>>> It is a quite lightweight exercise to rigorously specify this in
      terms of Lamport's clocks; but I concede that, lacking a shared
      intuition, it will take more than a sentence to communicate. I
      hesitate to turn this into a treatise on the application of Lamport's
      clocks to the JMM, so I'm letting it rest.
      >>>>>>>
      >>>>>>> I am only involved in this discussion because you said it
      isn't IRIW, but I see all signs that it is. I remember the discussion
      here doubting that IRIW should be supported, and I appreciate the
      arguments, but without the specification it is difficult to continue
      a meaningful discussion.
      >>>>>>
      >>>>>>
      >>>>>> That's strange to hear since I have pointed out exactly why
      it's not IRIW: if we broaden the definition such that it covers my
      case, then we must accept that Intel allows IRIW to happen because it
      explicitly excludes the writing thread from the guarantee which is
      supposed to disallow it.
      >>>>>
      >>>>>
      >>>>> Rwt6 and Rrt6 are reduntant. If you remove them, it becomes
      IRIW. Rwt6 only witnesses the particular ordering of some operations
      in IRIW - it forbids some of the outcomes from IRIW, but doesn't add
      new ones. Rrt6 is meaningless.
      >>>>
      >>>>
      >>>> IRIW involves four independent threads and six events. My
      example involves only three threads, so there must be something wrong
      in calling it "exactly IRIW". Apparently you have in mind some quite
      flexible definition of IRIW, but I cannot second-guess what it might
      be.
      >>>>
      >>>> ---
      >>>> Marko
      >>>
      >>>
      >>
      >


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





More information about the Concurrency-interest mailing list