[concurrency-interest] Enforcing total sync order on modern hardware

Oleksandr Otenko oleksandr.otenko at oracle.com
Tue Mar 24 08:47:14 EDT 2015


I don't know why you call IRIW a test of TSO, if total order of reads is 
also important. Don't forget the load-load barrier, which is implicit on 
x86, and a small theorem about what happens when you don't order 
concurrent reads totally.

Alex

On 24/03/2015 12:30, Alexander Terekhov wrote:
> IRIW is about TSO, not SC (which is TSO + store-load barrier). And TSO is
> release-acquire consistency + write atomicity.
>
>> the cornerstone of sequential consistency
> is awfully expensive store-load barrier.
>
> PMFJI
>
>
> Oleksandr Otenko <oleksandr.otenko at oracle.com>@cs.oswego.edu on 24.03.2015
> 12:42:10
>
> Sent by:	concurrency-interest-bounces at cs.oswego.edu
>
>
> To:	Marko Topolnik <marko at hazelcast.com>, concurrency-interest
>         <Concurrency-interest at cs.oswego.edu>
> cc:
> Subject:	Re: [concurrency-interest] Enforcing total sync order on modern
>         hardware
>
>
> 17.4.3
>        A set of actions is sequentially consistent if all actions occur in a
>        total order ...(plus details of how the total order relates to
>        program order etc)
>
> It looks like your objection goes against this. IRIW works because it must
> have the writes observed by all threads in the same order - due to the
> writes and reads forming a total order, even if they are independent -
> which is the cornerstone of sequential consistency.
>
> I don't know how you measure that the example is minimal, because in some
> sense it is also maximal - the minimal case being Dekker's invariant with
> two threads writing one and reading the other variable.
>
> T1: x=1;
> r0=x; // may fuse with x=1 - then you get the canonical form of the example
> r1=y;
>
> T2: y=1;
> r2=y; // may fuse with y=1
> r3=x;
>
> Alex
>
> On 23/03/2015 17:42, Marko Topolnik wrote:
>
>
>
>        On Mar 23, 2015 6:01 PM, "Oleksandr Otenko" <
>        oleksandr.otenko at oracle.com> wrote:
>        >
>        > IRIW results apply to any thread doing the reading. The existence
>        of the fourth thread only generalizes the result.
>
>
>        This is incorrect: the essential property of IRIW is that it
>        constructs a result for which no sequentially consistent explanation
>        exists. It is the minimal example to reproduce the issue of interest,
>        therefore none of its parts is optional.
>
>
>        >
>        > It seems this branch of the conversation is pointless.
>        >
>        > Alex
>        >
>        >
>        > On 23/03/2015 15:56, Marko Topolnik wrote:
>        >>
>        >> So your analogy to IRIW is established by introducing a whole new
>        reading thread. Such an analogy fails to capture the essence of my
>        scenario: I am interested precisely in the case where the
>        "reading-writing" thread observes its own writes in addition to the
>        "timer" thread's writes. The goal is to analyze the tension between
>        the desire to win performance through store forwarding and the need
>        to stay sequentially consistent. It was my impression that the
>        distributed nature of QPI messaging would result in the processors
>        grabbing more of the liberties allowed by Intel's specification,
>        which specifically excludes my scenario from the ordering guarantee.
>        As Aleksey pointed out, this is not the case because an MFENCE
>        instruction provides a stronger guarantee than that: the coherence
>        layer will have resolved the value at the stored location before the
>        load instruction asks for its value.
>        >>
>        >> ---
>        >> Marko
>        >>
>        >> On Mon, Mar 23, 2015 at 1:53 PM, Oleksandr Otenko <
>        oleksandr.otenko at oracle.com> wrote:
>        >>>
>        >>> Out of all outcomes IRIW permits, choose those that have the
>        fourth thread observe 1 then 0 - ie there exists a thread which
>        observed Wv1 occur before T9. Now you are looking at your case with
>        three threads. Your case does not add more outcomes, only chooses a
>        subset of those in IRIW.
>        >>>
>        >>>
>        >>> Alex
>        >>>
>        >>>
>        >>> On 20/03/2015 22:05, Marko Topolnik wrote:
>        >>>>
>        >>>> On Fri, Mar 20, 2015 at 7:52 PM, Oleksandr Otenko <
>        oleksandr.otenko at oracle.com> wrote:
>        >>>>>
>        >>>>> On 20/03/2015 18:12, Marko Topolnik wrote:
>        >>>>>>
>        >>>>>> On Fri, Mar 20, 2015 at 5:45 PM, Oleksandr Otenko <
>        oleksandr.otenko at oracle.com> wrote:
>        >>>>>>>
>        >>>>>>> No, that doesn't answer the question. You need to modify how
>        happens-before is built - because happens-before in JMM and in some
>        other model are two different happens-befores. If you get rid of
>        synchronization order, then you need to explain which reads the write
>        will or will not synchronize-with.
>        >>>>>>
>        >>>>>>
>        >>>>>> I think it's quite simple: the read may synchronize-with any
>        write as long as that doesn't break happens-before consistency.
>        >>>>>
>        >>>>>
>        >>>>> It seems quite naive, too. The problem is that currently the
>        read synchronizes-with all writes preceding it, but observes the
>        value set by the last write. Here you need to define somehow which
>        write the read observes - you need to somehow define which of the
>        writes is "last" and what the other readers are allowed to think
>        about it.
>        >>>>>
>        >>>>> It doesn't seem to be explained in one sentence.
>        >>>>
>        >>>>
>        >>>> It is a quite lightweight exercise to rigorously specify this in
>        terms of Lamport's clocks; but I concede that, lacking a shared
>        intuition, it will take more than a sentence to communicate. I
>        hesitate to turn this into a treatise on the application of Lamport's
>        clocks to the JMM, so I'm letting it rest.
>        >>>>>>>
>        >>>>>>> I am only involved in this discussion because you said it
>        isn't IRIW, but I see all signs that it is. I remember the discussion
>        here doubting that IRIW should be supported, and I appreciate the
>        arguments, but without the specification it is difficult to continue
>        a meaningful discussion.
>        >>>>>>
>        >>>>>>
>        >>>>>> That's strange to hear since I have pointed out exactly why
>        it's not IRIW: if we broaden the definition such that it covers my
>        case, then we must accept that Intel allows IRIW to happen because it
>        explicitly excludes the writing thread from the guarantee which is
>        supposed to disallow it.
>        >>>>>
>        >>>>>
>        >>>>> Rwt6 and Rrt6 are reduntant. If you remove them, it becomes
>        IRIW. Rwt6 only witnesses the particular ordering of some operations
>        in IRIW - it forbids some of the outcomes from IRIW, but doesn't add
>        new ones. Rrt6 is meaningless.
>        >>>>
>        >>>>
>        >>>> IRIW involves four independent threads and six events. My
>        example involves only three threads, so there must be something wrong
>        in calling it "exactly IRIW". Apparently you have in mind some quite
>        flexible definition of IRIW, but I cannot second-guess what it might
>        be.
>        >>>>
>        >>>> ---
>        >>>> Marko
>        >>>
>        >>>
>        >>
>        >
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>



More information about the Concurrency-interest mailing list