[concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

Oleksandr Otenko oleksandr.otenko at oracle.com
Wed Dec 10 17:58:33 EST 2014


This surely does not mean that on non-TSO architectures it is not 
possible to implement linearizable algorithms.

Nor does it mean that the algorithm will contain /independent/ writes.

Count the proofs that rely on synchronization order. That's the measure 
of how common IRIW pattern is. And yet, even then there is hope.

Suppose, a<b means a happens before b. There are only a few axioms 
coming from memory model:

order consistency: a<b -> b<a -> _|_    -- if you can prove a happens 
before b and b happens before a, it implies absurd

(semi-formally)
read consistency: (r v x) -> (exists (w v x) such that (w v x) < (r v x) 
&& ((w v y) -> (w v y) < (w v x) || (r v x) < (w v y)))  -- for every 
read r of variable v with value x there is a write w of value x such 
that the write happens before the read and all other writes either 
happen before this write, or happen after the read.

That's about it.

Then totality of order axiom is:
total order : a -> b -> ((a < b || b < a) && (a < b -> _|_) -> b < a && 
(b < a -> _|_) -> a < b)   -- for any two volatile operations, they are 
ordered in some way, and if you can prove a happens before b is absurd, 
then b happens before a, and converse. (the latter two things follow 
from order consistency, but stated here explicitly for comparison with 
the following)

DMB barrier on non-TSO is a different axiom:
dmb : a -> b -> c -> (a `po` b) -> ((a < c || c < b) && (c < b -> _|_) 
-> a < c && (a < c -> _|_) -> c < b)  -- for any two operations a and b 
with program order between them, a dmb barrier for any other operation c 
either proves a happens before c, or c happens before b, and if you can 
prove c happens before b is absurd, then it implies a happens before c, 
and the converse. Slight difference in reasoning from total order is 
that proving a<c does not exclude the possibility that in some other way 
c<b.

You can see many similarities between total order and DMB. The 
difficulty with IRIW case is only that there is just one write in each 
thread, so DMB has to enforce ordering between reads instead. And the 
difficulty with DMB is that, unlike volatile stores, there is no 
language-level idiom for it - you need to consider more than one entity 
to understand the effect, and more than one place in code. Total order, 
of course, is far easier to reason about - any two operations are 
ordered, and their ordering is exclusive, so you only need to look in 
one place.


Alex


On 26/11/2014 19:00, Roman Elizarov wrote:
>
> I'd suggest to start with the original paper by Herlihy who had come 
> up with the concept of Linearizability in 1990:
>
> Linearizability: A Correctness Condition for Concurrent Objects
>
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.5315
>
>
> There were lot of reasearch about linearizability since then (there 
> are almost a thouthand citations for this arcticle) expanding and 
> improving proof techniquies and applying it. There were no 
> breakthroughs of the comparable magnitude since then. All 
> "thread-safe" objects that you enconter in the modern word are 
> linearizable. It is a defacto "golden standard" correctness condition 
> for concurrent objects.
>
>
> This position is well deserved, because having lineariazable objects 
> as your building blocks makes it super-easy to formally reason about 
> correctness of your code. You will rarely encouter concurrent 
> algorithms that provide weaker guarantees (like quescient 
> consistency), because they all too hard to reason about -- they are 
> either not composable or not local. But when all your 
> concurrent objects are linearizable, then you can ditch 
> happens-before, forget that everything is actually parallel and 
> simply reason about your code in terms of interleaving of "atomic" 
> operations that happen in some global order. That is the beauty of 
> linearizability.
>
>
> But Linearizability is indeed a pretty strong requirement. 
> Linearizability of your shared memory requires that Independent Reads 
> of Independent Writes (IRIW) are consistent. Can you get away with 
> some weaker requirement and still get all the same goodies as 
> linearizability gets you? I have not seen anything promising in this 
> direction. Whoever makes this breakthrough will surely reap the 
> world's recognition and respect.
>
>
> /Roman
>
>
> ------------------------------------------------------------------------
> *??:* DT <dt at flyingtroika.com>
> *??????????:* 26 ?????? 2014 ?. 20:24
> *????:* Roman Elizarov; dholmes at ieee.org; Hans Boehm
> *?????:* core-libs-dev; concurrency-interest at cs.oswego.edu
> *????:* Re: [concurrency-interest] RFR: 
> 8065804:JEP171:Clarifications/corrections for fence intrinsics
> Roman,
> Can you point to any specific article providing the concurrency 
> problem statement with a further proof using linearizability to reason 
> about solution.
>
> Thanks,
> DT
>
> On 11/26/2014 2:59 AM, Roman Elizarov wrote:
>>
>> Whether IRIW has any _/practical/_ uses is definitely subject to 
>> debate. However, there is no tractable way for formal reasoning about 
>> properties of large concurrent systems, but via linearizability. 
>> Linearizability is the only property that is both local and 
>> hierarchical. It lets you build more complex linearizable algorithms 
>> from simpler ones, having quite succinct and compelling proofs at 
>> each step.
>>
>> In other words, if you want to be able to construct a formal proof 
>> that your [large] concurrent system if correct, then you must have 
>> IRIW consistency. Do you need a formal proof of correctness? Maybe 
>> not. In many applications hand-waving is enough,  but there are many 
>> other applications where hand-waving does not count as a proof. It 
>> may be possible to construct formal correctness proofs for some very 
>> simple algorithms even on a system that does not provide IRIW, but 
>> this is beyond the state of the art of formal verification for 
>> anything sufficiently complex.
>>
>> /Roman
>>
>> *From:*David Holmes [mailto:davidcholmes at aapt.net.au]
>> *Sent:* Wednesday, November 26, 2014 11:54 AM
>> *To:* Roman Elizarov; Hans Boehm
>> *Cc:* concurrency-interest at cs.oswego.edu; core-libs-dev
>> *Subject:* RE: [concurrency-interest] RFR: 
>> 8065804:JEP171:Clarifications/corrections for fence intrinsics
>>
>> Can you expand on that please. All previous discussion of IRIW I have 
>> seen indicated that the property, while a consequence of existing JMM 
>> rules, had no practical use.
>>
>> Thanks,
>>
>> David
>>
>>     -----Original Message-----
>>     *From:* Roman Elizarov [mailto:elizarov at devexperts.com]
>>     *Sent:* Wednesday, 26 November 2014 6:49 PM
>>     *To:* dholmes at ieee.org <mailto:dholmes at ieee.org>; Hans Boehm
>>     *Cc:* concurrency-interest at cs.oswego.edu
>>     <mailto:concurrency-interest at cs.oswego.edu>; core-libs-dev
>>     *Subject:* RE: [concurrency-interest] RFR:
>>     8065804:JEP171:Clarifications/corrections for fence intrinsics
>>
>>     There is no conceivable way to kill IRIW consistency requirement
>>     while retaining ability to prove correctness of large software
>>     systems. If IRIW of volatile variables are not consistent, then
>>     volatile reads and writes are not linearizable, which breaks
>>     linearizabiliy of all higher-level primitives build on top of
>>     them and makes formal reasoning about behavior of concurrent
>>     systems practically impossible. There are many fields where this
>>     is not acceptable.
>>
>>     /Roman
>>
>>     *From:*concurrency-interest-bounces at cs.oswego.edu
>>     <mailto:concurrency-interest-bounces at cs.oswego.edu>
>>     [mailto:concurrency-interest-bounces at cs.oswego.edu] *On Behalf Of
>>     *David Holmes
>>     *Sent:* Wednesday, November 26, 2014 5:11 AM
>>     *To:* Hans Boehm
>>     *Cc:* concurrency-interest at cs.oswego.edu
>>     <mailto:concurrency-interest at cs.oswego.edu>; core-libs-dev
>>     *Subject:* Re: [concurrency-interest] RFR: 8065804:
>>     JEP171:Clarifications/corrections for fence intrinsics
>>
>>     Hi Hans,
>>
>>     Given IRIW is a thorn in everyone's side and has no known useful
>>     benefit, and can hopefully be killed off in the future, lets not
>>     get bogged down in IRIW. But none of what you say below relates
>>     to multi-copy-atomicity.
>>
>>     Cheers,
>>
>>     David
>>
>>         -----Original Message-----
>>         *From:* hjkhboehm at gmail.com
>>         <mailto:hjkhboehm at gmail.com>[mailto:hjkhboehm at gmail.com]*On
>>         Behalf Of *Hans Boehm
>>         *Sent:* Wednesday, 26 November 2014 12:04 PM
>>         *To:* dholmes at ieee.org <mailto:dholmes at ieee.org>
>>         *Cc:* Stephan Diestelhorst;
>>         concurrency-interest at cs.oswego.edu
>>         <mailto:concurrency-interest at cs.oswego.edu>; core-libs-dev
>>         *Subject:* Re: [concurrency-interest] RFR: 8065804:
>>         JEP171:Clarifications/corrections for fence intrinsics
>>
>>         To be concrete here, on Power, loads can normally be ordered
>>         by an address dependency or light-weight fence (lwsync).
>>         However, neither is enough to prevent the questionable
>>         outcome for IRIW, since it doesn't ensure that the stores in
>>         T1 and T2 will be made visible to other threads in a
>>         consistent order. That outcome can be prevented by using
>>         heavyweight fences (sync) instructions between the loads
>>         instead.  Peter Sewell's group concluded that to enforce
>>         correct volatile behavior on Power, you essentially need a a
>>         heavyweight fence between every pair of volatile operations
>>         on Power.  That cannot be understood based on simple ordering
>>         constraints.
>>
>>         As Stephan pointed out, there are similar issues on ARM, but
>>         they're less commonly encountered in a Java implementation.
>>         If you're lucky, you can get to the right implementation
>>         recipe by looking at only reordering, I think.
>>
>>         On Tue, Nov 25, 2014 at 4:36 PM, David Holmes
>>         <davidcholmes at aapt.net.au <mailto:davidcholmes at aapt.net.au>>
>>         wrote:
>>
>>             Stephan Diestelhorst writes:
>>             >
>>             > David Holmes wrote:
>>             > > Stephan Diestelhorst writes:
>>             > > > Am Dienstag, 25. November 2014, 11:15:36 schrieb
>>             Hans Boehm:
>>             > > > > I'm no hardware architect, but fundamentally it
>>             seems to me that
>>             > > > >
>>             > > > > load x
>>             > > > > acquire_fence
>>             > > > >
>>             > > > > imposes a much more stringent constraint than
>>             > > > >
>>             > > > > load_acquire x
>>             > > > >
>>             > > > > Consider the case in which the load from x is an
>>             L1 hit, but a
>>             > > > > preceding load (from say y) is a long-latency
>>             miss.  If we enforce
>>             > > > > ordering by just waiting for completion of prior
>>             operation, the
>>             > > > > former has to wait for the load from y to
>>             complete; while the
>>             > > > > latter doesn't.  I find it hard to believe that
>>             this doesn't leave
>>             > > > > an appreciable amount of performance on the
>>             table, at least for
>>             > > > > some interesting microarchitectures.
>>             > > >
>>             > > > I agree, Hans, that this is a reasonable
>>             assumption.  Load_acquire x
>>             > > > does allow roach motel, whereas the acquire fence
>>             does not.
>>             > > >
>>             > > > >  In addition, for better or worse, fencing
>>             requirements on at least
>>             > > > >  Power are actually driven as much by store
>>             atomicity issues, as by
>>             > > > >  the ordering issues discussed in the cookbook. 
>>             This was not
>>             > > > >  understood in 2005, and unfortunately doesn't
>>             seem to be
>>             > amenable to
>>             > > > >  the kind of straightforward explanation as in
>>             Doug's cookbook.
>>             > > >
>>             > > > Coming from a strongly ordered architecture to a
>>             weakly ordered one
>>             > > > myself, I also needed some mental adjustment about
>>             store (multi-copy)
>>             > > > atomicity.  I can imagine others will be unaware of
>>             this difference,
>>             > > > too, even in 2014.
>>             > >
>>             > > Sorry I'm missing the connection between fences and
>>             multi-copy
>>             > atomicity.
>>             >
>>             > One example is the classic IRIW. With non-multi copy
>>             atomic stores, but
>>             > ordered (say through a dependency) loads in the
>>             following example:
>>             >
>>             > Memory: foo = bar = 0
>>             > _T1_         _T2_         _T3_                       _T4_
>>             > st (foo),1   st (bar),1   ld r1, (bar)                 
>>                 ld r3,(foo)
>>             >                           <addr dep / local "fence"
>>             here>   <addr dep>
>>             >                           ld r2, (foo)                 
>>                 ld r4, (bar)
>>             >
>>             > You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on
>>             non-multi-copy atomic
>>             > machines.  On TSO boxes, this is not possible.  That
>>             means that the
>>             > memory fence that will prevent such a behaviour (DMB on
>>             ARM) needs to
>>             > carry some additional oomph in ensuring multi-copy
>>             atomicity, or rather
>>             > prevent you from seeing it (which is the same thing).
>>
>>             I take it as given that any code for which you may have
>>             ordering
>>             constraints, must first have basic atomicity properties
>>             for loads and
>>             stores. I would not expect any kind of fence to add
>>             multi-copy-atomicity
>>             where there was none.
>>
>>             David
>>
>>
>>             > Stephan
>>             >
>>             > _______________________________________________
>>             > Concurrency-interest mailing list
>>             > Concurrency-interest at cs.oswego.edu
>>             <mailto:Concurrency-interest at cs.oswego.edu>
>>             > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>             _______________________________________________
>>             Concurrency-interest mailing list
>>             Concurrency-interest at cs.oswego.edu
>>             <mailto:Concurrency-interest at cs.oswego.edu>
>>             http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20141210/b8499892/attachment-0001.html>


More information about the Concurrency-interest mailing list