[concurrency-interest] AtomicReference.updateAndGet() mandatory updating

Alex Otenko oleksandr.otenko at gmail.com
Tue May 30 12:52:34 EDT 2017


> On 30 May 2017, at 17:34, Gil Tene <gil at azul.com> wrote:
> 
> 
>> On May 30, 2017, at 9:26 AM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>> wrote:
>> 
>> 
>>> A failing atomic CAS (of any kind) has nothing to be atomic about. So "detecting non-atomic" simply means "it failed".
>> 
>> 
>> Sure. I only took it a bit further, and explored the hypothetical implementation that didn’t fail sc spuriously (suppose, variables are cache-aligned, and no other variables reside on the same cache line, and no other events cause cache invalidation). The weak CAS would then be detectable as non-atomic - you can detect the load is not strictly after the store that failed the CAS. This is essentially what the litmus test for atomicity establishes (see one of previous messages proposing such a test, when we switched to discussing atomicity).
> 
> *How* would you determine that is is not strictly after the store?
> 
> You seem to be discounting the various other things that are allowed to fail a weak CAS. Like interrupts, faults, and "I don't feel like it". You can't tell why the failure happened (in a weak CAS), and therefore cannot make any statements about the ordering of the operation with relation to other operations.
> 
> Not that it's relevant, since a failing case makes no atomicity statements. So nothing to prove/disprove.

The weak one is in the discussion only for the sake of contrast.

The strong one doesn’t explicitly state that atomicity claim is for a successful CAS only. This branch of the discussion started with a claim that it is atomic only if successful. But if the strong CAS always places a load after the store that failed it, then it is atomic.


Here’s how you can determine whether a failing CAS places a load strictly after the store that failed it:

int x=0;
volatile int z=0;

Thread 1:
if ( ! CAS(z, 0, 1) ) {
  return x;
}
return 1;

Thread 2:
x=1;
z=1;


If failing CAS always places a load after the store that failed it, Thread 1 always returns 1.

If you have a weaker notion of CAS, then even if by divine intervention CAS fails in a particular way - the value did match at first, but got modified just before sc started - Thread 1 is still not guaranteed to return 1 - because the weaker CAS does not place load after the store, so no synchronizes-with between the failing CAS and the store that failed it, and no happens-before between the write and read of x.


>> 
>>> A failing strong CAS instruction on field z is strictly after the store that failed it, just like any load that would observe that store.
>> 
>> 
>> Excellent. Then it synchronizes-with it. Then it is observed as a atomic - not distinguishable from atomic.
> 
> Not distinguishable from a load either. A single operation instruction is atomic, period. It is indivisible from itself.

That’s good!

If that statement can be recorded like that in the spec, then I am satisfied.


Alex


> 
>> 
>> 
>>> But it is not necessarily before any other stores.
>> 
>> 
>> Yes, I realize this detail. But such ordering is not detectable - not distinguishable from those other stores being strictly after or strictly before the failing strong CAS.
>> 
>> 
>> Alex
>> 
>>> On 30 May 2017, at 16:27, Gil Tene <gil at azul.com <mailto:gil at azul.com>> wrote:
>>> 
>>> 
>>> 
>>> Sent from my iPad
>>> 
>>> On May 30, 2017, at 2:12 AM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>> wrote:
>>> 
>>>> 
>>>>> On 29 May 2017, at 21:20, Gil Tene <gil at azul.com <mailto:gil at azul.com>> wrote:
>>>>> 
>>>>> 
>>>>>> On May 29, 2017, at 11:43 AM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 29 May 2017, at 15:42, Gil Tene <gil at azul.com <mailto:gil at azul.com>> wrote:
>>>>>>> 
>>>>>>> When a set of accesses is atomic, they appear to happen instantaneously, with no opportunity for any external observer or mutator to interleave within them
>>>>>> 
>>>>>> When you said “no opportunity … to interleave”, you defined a total order: the operations are not allowed to start before something atomic has finished, and something atomic is not allowed to start, if any other operation hasn’t finished - this is equivalent to defining a total order of operations.
>>>>> 
>>>>> Nothing about "any other operation" is implied. Only about operations on the field(s) involved in the atomic operation. That's a key difference, and the reason there is no total order implied or involved here. There is no implied order or relationship whatsoever with operations involving any other fields. So no order of any sort.
>>>>> 
>>>>> If you are referring to some "total order" that only applies to the the field involved in the CAS (and not in relationship to any other operations or fields), I'd buy that a a possible statemewnt. But it is not a very meaningful one because it would be orthogonal to any ordering statements in the rest of the program.
>>>>> 
>>>>>> ll/sc sequence does not create atomicity, that’s a very important point. ll/sc does not *stop* interleavings from happening. ll/sc only *witnesses* whether any interleavings happened. So it is meaningless to say “a successful ll/sc is atomic”.
>>>>> 
>>>>> ll/sc (on a successful sc) is no less atomic with regards to the field involved than a successful CAS is. Successes are always atomic. And failures in both cases are not. A failed CAS performed only one operation (the read), so it has nothing to be atomic.
>>>> 
>>>> It has, potentially.
>>>> 
>>>> Let me recap the diagram I’ve posted before.
>>>> 
>>>> A weak CAS:
>>>> 
>>>>    store z 0
>>>> load z
>>>>    store z 2
>>>> // skip store z 1
>>>> 
>>>> Here the weak CAS managed to execute a load, then observed something it believes to be an interleaving store, and skipped store of 1. Weak CAS does not synchronize-with the store that failed it, so it can be detected to be non-atomic. (Weak CAS is allowed to also fail for other reasons, but that’s not part of the problem)
>>> 
>>> A failing atomic CAS (of any kind) has nothing to be atomic about. So "detecting non-atomic" simply means "it failed".
>>> 
>>> But you can't *detect* an interleaving store here. There is nothing that lets you know that there was actually an interleaving store. A failure certainly doesn't tell you that's what happened. The failure could have been caused by something else (spurious) with no store at all. And it could also be caused by a preceding store. No way to tell.
>>> 
>>>> A version of a strong CAS:
>>>> 
>>>>    store z 0
>>>> load z
>>>>    store z 2
>>>> // interleaving store triggers retry - ll/sc-based primitive has to ascertain it is not a “spurious” failure
>>>> load z
>>>> // skip store z 1
>>>> 
>>>> Here the strong CAS managed to execute a load, then observed something happened, then loaded again to be sure it was not due to false sharing but a true interleaving store to z. This second load synchronizes-with the store that failed it. This version of strong CAS is no less atomic than a successful CAS - as in “a single entity in the total order of all stores to z”, because it behaves like a CAS that was scheduled strictly after the store that failed it and before any other stores.
>>> 
>>> You are mixing atomicity of the CAS sub-operations (the indivisibility of the read and the write if the write occurs) with ordering against other operations. Nothing about atomicity implies ordering. And failing CAS is no more or less atomic than a single load operation.
>>> 
>>> A failing strong CAS instruction on field z is strictly after the store that failed it, just like any load that would observe that store. But it is not necessarily before any other stores. That second part depends on ordering promises and on program order. Nothing to do with atomicity.
>>> 
>>> A failing strong compareAndSet (e.g. in Java 8) is strictly before any loads and stores that follow it in program order. That's true because of its memory ordering semantics promises. Not because of atomicity.
>>> 
>>>> So my big question is: can a strong CAS detect the presence of an interleaving store without synchronizing-with it? Can it tell it is an interleaving store and not something else, without issuing a load or having effect of such a load?
>>> 
>>> A successful atomic CAS instruction (weak or strong) guarantees that no interleaving store occurred. That not the same as being able to detect that one occurs.
>>> 
>>> A CAS can't detect an interleaving store in either case (strong or weak). The reason for failure in weak CAS is "I felt like it", and the reason for failure in strong CAS is "the value in the field did not match the expected value". Both reasons can occur with no interleaving store (e.g. the store could have occurred in the past). Since there is no "interleaving store detection" mechanism to begin with, the rest of the question is therefore not relevant.
>>> 
>>>> 
>>>> 
>>>> Alex
>>>> 
>>>>> A successful sc ensures no "witnessing" of interleaving occurred, just like a successful CAS does. The (additional potential) causes of failure [where no atomicity is provided] may vary between ll/sc, strong CAS, and weak CAS, but the knowledge upon success is the same. Success is atomic.
>>>>> 
>>>>> I have previously worked on a weakly ordered architecture that implemented an atomic but completely unordered CAS instruction. That implementation, being weakly ordered, simply froze the L1 cache coherence protocol for the duration of the atomic operation after establishing the cache line exclusive in L1. By ensuring the protocol could not proceed with respect to the field involved, it trivially ensured atomicity without making any ordering requirements about nay other operations. Ordering was controlled separately, with explicit fences for combinations of ldld, ldst, stst, and stld. Any amount of reordering was allowed if fences were not there to prevent it, so a CAS was not guaranteed to be ordered against any other operations unless one or more of those fences actually existed in the instruction flow between them. 
>>>>> 
>>>>>> 
>>>>>> Alex
>>>>>> 
>>>>>> 
>>>>>>> On 29 May 2017, at 15:42, Gil Tene <gil at azul.com <mailto:gil at azul.com>> wrote:
>>>>>>> 
>>>>>>> Atomicity has nothing to do with order. Atomicity only applies to the locations it refers to (e.g. a field, a cache line, a set of fields or cache lines [with HTM for example]), and it either is or isn't. When a set of accesses is atomic, they appear to happen instantaneously, with no opportunity for any external observer or mutator to interleave within them. This has absolutely nothing to do with the order in which those accesses appear in relation to accesses (by this thread or others) to values that are not included in the atomic operation.
>>>>>>> 
>>>>>>> With no relation to atomicity (there are plenty of ways to perform non-atomic CAS), a "weak" CAS means that the store to the field will occur only if the value observed in the field is equal to the expected value. A "strong" CAS means that the store to the field will occur if and only if the value observed in the field is equal to the expected value. Neither of those descriptions have anything to do with atomicity. The difference between them is that a strong CAS must write to the field if it observes the right value in the field, while a weak CAS may spuriously decide not to write (no IFF requirement).
>>>>>>> 
>>>>>>> An atomic CAS (weak or strong) simply makes the observation and write (if the write happens) occur atomically. The atomicity property prevents external interference in the sequence. Nothing else. Atomicity does not affect the weak/strong part (a weak CAS may still fail spuriously, even without external interference with the observed value). Atomicity has no implications on the memory ordering semantics of the operations in the sequence with respect to location not covered by the atomicity property, and other than the atomicity property, it makes no claims about memory ordering with respect to other threads.
>>>>>>> 
>>>>>>> There are plenty of atomic but non-ordered CAS implementations out there. Including some hardware CAS instruction implementations, as well as the hardware instruction combinations commonly used to construct a CAS operation on some architectures.
>>>>>>> 
>>>>>>> E.g. a ll/sc sequence where the sc was successful guarantees atomicity for the combination of the ll and sc operations performed on the same memory location. This primitive can be used to construct a weak CAS directly, and typically requires a loop to construct a strong CAS (since many things, including interrupts, can cause spurious failures). A ll/sc does not in itself imply ordering against accesses to other memory locations (unless the specific architecture defines it that way).
>>>>>>> 
>>>>>>> In Java, compareAndSet has been previously defined to to mean a strong (non-spuriously-failing) CAS with unconditional memory ordering semantics of a volatile read and a volatile write. weakCompareAndSet was previously defined as a weak (may spuriously fail) CAS with no implied memory ordering semantics. Both are atomic.
>>>>>>> 
>>>>>>> The recent discussion here is focused on whether a relaxing of the memory ordering semantics of compareAndSet, from the volatile write semantics being unconditional to being conditional (on the write actually occurring) is advisable. The claim is that there is existing Java software out there that may rely on the existing unconditional definition for correctness, and that relaxing the definition will break such software. Examples of how the conditional/unconditional behavior difference can be observed by a concurrent algorithm were given (I believe) as proof that such software can (and likely does) exist.
>>>>>>> 
>>>>>>> Sent from my iPad
>>>>>>> 
>>>>>>> On May 29, 2017, at 4:39 AM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>> wrote:
>>>>>>> 
>>>>>>>> I also have this intuition, but this looks like a proof by example, not a specification.
>>>>>>>> 
>>>>>>>> A specification would look something like:
>>>>>>>> 1. CAS executes a volatile load unconditionally.
>>>>>>>> 2. CAS executes a volatile store conditionally. The store is added to the total order of all operations if and only if the value loaded is equal to the expected value, and no other store appears in the total order of accesses to the same volatile variable after the load and before the volatile store.
>>>>>>>> 
>>>>>>>> In this way CAS store is not atomic - the correct description is closer to “*exclusive* with other stores”.
>>>>>>>> 
>>>>>>>> That is a weak CAS. It does not say anything about when it fails, so is allowed to fail at will (spuriously fail).
>>>>>>>> 
>>>>>>>> A strong CAS also has:
>>>>>>>> 
>>>>>>>> 3. If there are no volatile stores to the variable after the load in step 1, the volatile store is always executed.
>>>>>>>> 
>>>>>>>> This makes the CAS store “*mutually* exclusive with other strong CASes”. (Still, “atomic” is a wrong term.)
>>>>>>>> 
>>>>>>>> The question then is - in order to fail, it has to observe a volatile store; is it able to ascertain the presence of volatile stores to the variable without establishing a synchronizes-with relationship to such a store? It does not seem possible, and atomicity of the failing strong CAS follows. (That is, non-atomicity of implementation is not observable.)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Alex
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 29 May 2017, at 12:10, David Holmes <davidcholmes at aapt.net.au <mailto:davidcholmes at aapt.net.au>> wrote:
>>>>>>>>> 
>>>>>>>>> The atomicity property of CAS ensures that the value being CAS’d updates in the manner proscribed by the application logic. If I want all threads to get a unique Id they can CAS a global “int id” and always increment the value. The atomicity of CAS ensures no two threads get the same Id and that there are no gaps in the assigned id values. The CAS may be the only means by which the variable is accessed so no other stores even enter into the picture.
>>>>>>>>>  
>>>>>>>>> David
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>> From: Alex Otenko [mailto:oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>] 
>>>>>>>>> Sent: Monday, May 29, 2017 7:43 PM
>>>>>>>>> To: dholmes at ieee.org <mailto:dholmes at ieee.org>
>>>>>>>>> Cc: concurrency-interest at cs.oswego.edu <mailto:concurrency-interest at cs.oswego.edu>
>>>>>>>>> Subject: Re: [concurrency-interest] AtomicReference.updateAndGet() mandatory updating
>>>>>>>>>  
>>>>>>>>> Sorry, but I don’t see how you separate synchronization properties of CAS and atomicity :-)
>>>>>>>>>  
>>>>>>>>> I don’t see how you could describe atomicity without specifying the place of CAS with respect to the other stores. Once you placed it somewhere among the other stores, it synchronizes-with those preceding it.
>>>>>>>>>  
>>>>>>>>> Now, atomicity of a succeeding CAS is not falsifiable. It can just as well be non-atomic, and succeed, if the other stores were ordered in the same way. There is no meaning whatsoever in declaring a succeeding CAS atomic.
>>>>>>>>>  
>>>>>>>>> Successful CAS atomic        Successful CAS not atomic
>>>>>>>>> store z 0                    store z 0
>>>>>>>>> CAS z 0 1                    load z
>>>>>>>>>                              store z 1
>>>>>>>>> store z 2                    store z 2
>>>>>>>>>  
>>>>>>>>> Can you detect the effects of a successful CAS being not atomic? What does atomicity of a successful CAS promise? I see nothing.
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>> There is a difference between atomic and non-atomic failing CAS - that’s where it makes sense to specify whether it is atomic or not.
>>>>>>>>>  
>>>>>>>>> Failing CAS atomic intrinsic          Failing CAS not atomic
>>>>>>>>>                              Not detectable    Detectable             Not detectable
>>>>>>>>> store z 0                    store z 0         store z 0              store z 0
>>>>>>>>> store z 2                    store z 2         load z                 load z
>>>>>>>>> CAS z 0 1                    load z              store z 2              store z 2
>>>>>>>>>                                                // store z 1 skipped   // store z 2 triggers retry
>>>>>>>>>                                                                       load z
>>>>>>>>>                                                                       // store z 1 skipped
>>>>>>>>>  
>>>>>>>>> If non-atomicity of a failing CAS can be detected, it becomes even closer to weakCompareAndSet, which fails spuriously, and is a concern. On the other hand, it may just as well promise atomicity even of a failing CAS, because it needs to distinguish a spurious failure of the underlying ll/sc primitive, and the procedure for distinguishing that possibly necessarily establishes the synchronizes-with edge with the store that failed it.
>>>>>>>>>  
>>>>>>>>> I don’t see all ends, so maybe someone wants to not promise atomicity of the failing strong CAS. But in that case there is no need to promise atomicity at all, because the promise of atomicity of a succeeding CAS gives you nothing. Unless you can show how a non-atomic successful CAS could be detected?
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>> Alex
>>>>>>>>>  
>>>>>>>>>> On 29 May 2017, at 09:31, David Holmes <davidcholmes at aapt.net.au <mailto:davidcholmes at aapt.net.au>> wrote:
>>>>>>>>>>  
>>>>>>>>>> Sorry but I don’t see what you describe as atomicity. The atomicity of a successful CAS is the only atomicity the API is concerned about. The memory synchronization properties of CAS are distinct from its atomicity property.
>>>>>>>>>>  
>>>>>>>>>> David
>>>>>>>>>>  
>>>>>>>>>> From: Concurrency-interest [mailto:concurrency-interest-bounces at cs.oswego.edu <mailto:concurrency-interest-bounces at cs.oswego.edu>] On Behalf Of Alex Otenko
>>>>>>>>>> Sent: Monday, May 29, 2017 6:15 PM
>>>>>>>>>> To: dholmes at ieee.org <mailto:dholmes at ieee.org>
>>>>>>>>>> Cc: concurrency-interest at cs.oswego.edu <mailto:concurrency-interest at cs.oswego.edu>
>>>>>>>>>> Subject: Re: [concurrency-interest] AtomicReference.updateAndGet() mandatory updating
>>>>>>>>>>  
>>>>>>>>>> Thanks.
>>>>>>>>>>  
>>>>>>>>>> No, I am not concerned about the atomicity of hardware instructions. I am concerned about atomicity as the property of the memory model.
>>>>>>>>>>  
>>>>>>>>>> Claiming atomicity of a successful CAS is pointless. If CAS is not atomic on failure, then there is no need to claim it is atomic at all.
>>>>>>>>>>  
>>>>>>>>>> Example where you can claim atomicity of a failing CAS:
>>>>>>>>>>  
>>>>>>>>>> do{
>>>>>>>>>>   tmp = load_linked(z);
>>>>>>>>>> } while(tmp == expected && store_conditional(z, updated));
>>>>>>>>>>  
>>>>>>>>>> Here if store_conditional fails, it is followed by another volatile load, so the construct will synchronize-with the write that failed it, and it will appear atomic to the observer.
>>>>>>>>>>  
>>>>>>>>>>  
>>>>>>>>>> Alex
>>>>>>>>>>  
>>>>>>>>>>  
>>>>>>>>>>> On 29 May 2017, at 09:03, David Holmes <davidcholmes at aapt.net.au <mailto:davidcholmes at aapt.net.au>> wrote:
>>>>>>>>>>>  
>>>>>>>>>>> Sorry Alex but you are using “atomicity” in a way that doesn’t make sense to me. The only thing that is atomic is the successful CAS. I see what you are trying to say about a failing ll/sc CAS and the write that caused it to fail, but that is not “atomicity” to me – at least from the API perspective. You seem to be concerned about the atomicity of a sequence of hardware instructions. The API doesn’t tell you anything about how the implementation is done, only that the result of a successful operation is atomic with respect to any other update of the variable.
>>>>>>>>>>>  
>>>>>>>>>>> David
>>>>>>>>>>>  
>>>>>>>>>>> From: Alex Otenko [mailto:oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>] 
>>>>>>>>>>> Sent: Monday, May 29, 2017 5:55 PM
>>>>>>>>>>> To: dholmes at ieee.org <mailto:dholmes at ieee.org>
>>>>>>>>>>> Cc: Hans Boehm <boehm at acm.org <mailto:boehm at acm.org>>; concurrency-interest at cs.oswego.edu <mailto:concurrency-interest at cs.oswego.edu>
>>>>>>>>>>> Subject: Re: [concurrency-interest] AtomicReference.updateAndGet() mandatory updating
>>>>>>>>>>>  
>>>>>>>>>>> This came out a bit garbled. So, here it goes a bit clearer why the spec and the “ubiquitous terminology” are not enough, perhaps.
>>>>>>>>>>>  
>>>>>>>>>>> The claim of “atomicity” for succeeding CAS is not interesting, because it is not falsifiable: if CAS succeeded, it is evidence in itself that no volatile write appeared between the read and write parts of CAS, not evidence of atomicity as the property of the construct. We cannot explain atomicity of CAS by giving the specification of effects of the successful CAS. But Javadocs does just that, and *only* that.
>>>>>>>>>>>  
>>>>>>>>>>> ll/sc as a construct does not synchronize-with the write failing the sc instruction. So if CAS that uses ll/sc does not make efforts to synchronize-with that write, we can detect it is not atomic - we can detect that it cannot be seen as an operation that appeared entirely before or after all stores to the same variable.
>>>>>>>>>>>  
>>>>>>>>>>> So I am asking whether the *failing* CAS promises atomicity.
>>>>>>>>>>>  
>>>>>>>>>>>  
>>>>>>>>>>> Alex
>>>>>>>>>>>  
>>>>>>>>>>>  
>>>>>>>>>>>> On 29 May 2017, at 00:26, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>> wrote:
>>>>>>>>>>>>  
>>>>>>>>>>>> Yeah, I know what atomicity means in x86. But since the “write semantics” of the CAS are questioned, I have to also ask whether the other formulations are precise enough.
>>>>>>>>>>>>  
>>>>>>>>>>>> Atomicity means “indivisible”. It means that it either appears before a store, or after a store. If it appears after the store, then it synchronizes-with that store, and I am bound to observe stores preceding it. But not so in the weaker semantics Hans talks about! If the failure occurs during the sc part, you have to assume the load is before that store (but why does it fail then), or you have to assume it overlaps with a concurrent store. Either way, the core function is *not* atomic.
>>>>>>>>>>>>  
>>>>>>>>>>>> Unless there are extra volatile loads upon failure of (strong) compareAndSet.
>>>>>>>>>>>>  
>>>>>>>>>>>> It’s not just the “no intervening store”, meaning “if it’s stored, the condition expected=actual was not violated by any other store”.
>>>>>>>>>>>>  
>>>>>>>>>>>> The gist of atomicity:
>>>>>>>>>>>>  
>>>>>>>>>>>> int x=0;
>>>>>>>>>>>> volatile int z=0;
>>>>>>>>>>>>  
>>>>>>>>>>>> Thread 1:
>>>>>>>>>>>> if (! CAS(z, 0, 1)) {
>>>>>>>>>>>>   return x;
>>>>>>>>>>>> }
>>>>>>>>>>>> return 1;
>>>>>>>>>>>>  
>>>>>>>>>>>> Thread 2:
>>>>>>>>>>>> x=1;
>>>>>>>>>>>> z=1;
>>>>>>>>>>>>  
>>>>>>>>>>>> If CAS is atomic, failing CAS synchronizes-with the volatile write that fails it, and Thread 1 will always return 1. 
>>>>>>>>>>>>  
>>>>>>>>>>>> Alex
>>>>>>>>>>>>  
>>>>>>>>>>>>  
>>>>>>>>>>>>  
>>>>>>>>>>>>> On 28 May 2017, at 23:52, David Holmes <davidcholmes at aapt.net.au <mailto:davidcholmes at aapt.net.au>> wrote:
>>>>>>>>>>>>>  
>>>>>>>>>>>>> Alex,
>>>>>>>>>>>>>  
>>>>>>>>>>>>> I don’t recall anyone ever questioning what the atomic means in these atomic operations – it is ubiquitous terminology. If the store happens it is because the current value was the expected value. That is indivisible ie atomic. There can be no intervening store. This is either the semantics of the hardware instruction (e.g. cmpxchg) or else must be emulated using whatever is available e.g. ll/sc instructions (where an intervening store, in the strong CAS, must cause a retry).
>>>>>>>>>>>>>  
>>>>>>>>>>>>> David
>>>>>>>>>>>>>  
>>>>>>>>>>>>> From: Concurrency-interest [mailto:concurrency-interest-bounces at cs.oswego.edu <mailto:concurrency-interest-bounces at cs.oswego.edu>] On Behalf Of Alex Otenko
>>>>>>>>>>>>> Sent: Monday, May 29, 2017 7:40 AM
>>>>>>>>>>>>> To: Hans Boehm <boehm at acm.org <mailto:boehm at acm.org>>
>>>>>>>>>>>>> Cc: concurrency-interest at cs.oswego.edu <mailto:concurrency-interest at cs.oswego.edu>
>>>>>>>>>>>>> Subject: Re: [concurrency-interest] AtomicReference.updateAndGet() mandatory updating
>>>>>>>>>>>>>  
>>>>>>>>>>>>> Yes, you could read it both ways. You see, lock-based implementations and x86 LOCK:CMPXCHG semantics inspire to interpret the statement such that there is at least some write-like semantics (hence “memory *effects*”) - not necessarily a write to z, but fences or whatever that imitates a volatile write to z from JMM.
>>>>>>>>>>>>>  
>>>>>>>>>>>>>  
>>>>>>>>>>>>> The other source of confusion is the claim of atomicity. Is it “atomically (sets the value) (to the given updated value if the current value = the expected value)” or “atomically (sets the value to the given updated value if the current value == the expected value)”? Does atomicity imply it is a single item in total order of all operations? Or all stores? Or just stores to that variable? If you know how it’s implemented, it turns out it is far from atomic.
>>>>>>>>>>>>>  
>>>>>>>>>>>>> Does it at least *implement* atomic behaviour, does it *appear* atomic to an observer? For example, if a concurrent store appears between the load and “the store”  (in quotes, because it may not be executed - so in that case it is no longer “between”), do we get synchronizes-with edge with the store that preceded the load, or also the store that intervened? If we don’t get synchronizes-with edge to the store that intervened (which I suspect it doesn’t), then it is not atomic in any of those senses (but x86 and lock-based implementations create false analogies, so we get “atomic” in the method description).
>>>>>>>>>>>>>  
>>>>>>>>>>>>>  
>>>>>>>>>>>>> It needs to be specced out, best of all formally in JMM as the source of authority, rather than higher-level API javadocs, spread all over the place.
>>>>>>>>>>>>>  
>>>>>>>>>>>>> Alex
>>>>>>>>>>>>>  
>>>>>>>>>>>>>  
>>>>>>>>>>>>>> On 28 May 2017, at 18:30, Hans Boehm <boehm at acm.org <mailto:boehm at acm.org>> wrote:
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> Thanks. I think I understand now. If Thread 2 returns false, the Thread 2 CAS failed, and the initial CAS in Thread 1 succeeds. Either x immediately reads back as 1 in Thread 1, or we set b to true after Thread 2 returns b. Thus the second (successful) CAS in Thread 1 must follow the unsuccessful Thread 2 CAS in synchronization order. So any write to z by the failed CAS synchronizes with the second successful CAS in Thread 1, and we could thus conclude that x is 1 in the Thread 1 return.
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> This relies critically on the assumption that the Thread 2 failed CAS has the semantics of a volatile write to z.
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> I think the actual relevant spec text is:
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> 1) "compareAndSet and all other read-and-update operations such as getAndIncrement have the memory effects of both reading and writing volatile variables."
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> 2) "Atomically sets the value to the given updated value if the current value == the expected value."
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> I would not read this as guaranteeing that property. But I agree the spec doesn't make much sense; I read (2) as saying there is no write at all if the CAS fails, as I would expect. Thus it seems like a stretch to assume that the write from (1) is to z, though I have no idea what write it would refer to.
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> The prior implementation discussion now does make sense to me. I don't think this is an issue for lock-based implementations. But the only reasonable way to support it on ARMv8 seems to be with a conditionally executed fence in the failing case. That adds two instructions, as well as a large amount of time overhead for algorithms that don't retry on a strong CAS. My impression is that those algorithms are frequent enough to be a concern.
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> On Sat, May 27, 2017 at 4:49 PM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>> wrote:
>>>>>>>>>>>>>>> That’s right.
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Atomicity (for some definition of atomicity - ie atomic with respect to which operations) is not needed here. As long as the store in CAS occurs always, x=1 is not “reordered” (certainly, not entirely - can’t escape the “store” that is declared in the spec).
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Alex
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> On 28 May 2017, at 00:43, Hans Boehm <boehm at acm.org <mailto:boehm at acm.org>> wrote:
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> I gather the interesting scenario here is the one in which the Thread 2 CAS fails and Thread 2 returns false, while the initial Thread 1 CAS succeeds?
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> The correctness argument here relies on the fact that the load of x in Thread 1 must, in this scenario, see the store of x in Thread 2? This assumes the load of z in the failing CAS in Thread 2 can't be reordered with the ordinary (and racey!) store to x by the same thread. I agree that the j.u.c.atomic spec was not clear in this respect, but I don't think it was ever the intent to guarantee that. It's certainly false for either a lock-based or ARMv8 implementation of CAS. Requiring it would raise serious questions about practical implementability on several architectures.
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> The C++ standard is quite clear that this is not required; atomicity means only that the load of a RMW operation sees the immediately prior write in the coherence order for that location. It doesn't guarantee anything about other accesses somehow appearing to be performed in the middle of the operation. It's completely analogous to the kind of atomicity you get in a lock-based implementation.
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> On Sat, May 27, 2017 at 3:26 PM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>> wrote:
>>>>>>>>>>>>>>>>> Not sure what you mean by “acting as a fence” being broken.
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> There’s probably even more code that relies on atomicity of CAS - that is, when the write happened on successful CAS, it happened atomically with the read; it constitutes a single operation in the total order of all volatile stores.
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> int x=0; // non-volatile
>>>>>>>>>>>>>>>>> volatile int z=0;
>>>>>>>>>>>>>>>>> volatile boolean b=false;
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Thread1:
>>>>>>>>>>>>>>>>> if (CAS(z, 0, 1)) {
>>>>>>>>>>>>>>>>>   if (x == 0) {
>>>>>>>>>>>>>>>>>     b=true;
>>>>>>>>>>>>>>>>>     CAS(z, 1, 2);
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> return x;
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Thread2:
>>>>>>>>>>>>>>>>> x=1;
>>>>>>>>>>>>>>>>> if (!CAS(z, 0, 2)) {
>>>>>>>>>>>>>>>>>   return b;
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> return true;
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> In essence, if CAS failure is caused by a real mismatch of z (not a spurious failure), then we can guarantee there is a return 1 or a further CAS in the future from the point of the first successful CAS (by program order), and we can get a witness b whether that CAS is in the future from the point of the failing CAS (by total order of operations).
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> If failing CAS in Thread2 does not have store semantics, then nothing in Thread1 synchronizes-with it, and Thread1 is not guaranteed to return 1 even if Thread2 returns false.
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> If failing CAS in Thread2 does have store semantics, then if Thread2 returns false, Thread1 returns 1.
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Not sure what you mean by “real programming concerns”. It sounds a bit like “true Scotsman”. The concern I am trying to convey, is that Java 8 semantics offer a very strong CAS that can be used to enforce mutual exclusion using a single CAS call, and that this can be combined with inductive types to produce strong guarantees of correctness. Having set the field right, I can make sure most contenders execute less than a single CAS after mutation. Sounds real enough concern to me :)
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Anyhow, I also appreciate that most designs do not look that deep into the spec, and won’t notice the meaning getting closer to the actual hardware trends. If Java 8 CAS semantics gets deprecated, the algorithm will become obsolete, and will need modification with extra fences in the proprietary code that needs it, or whatever is not broken in the new JMM that will lay the memory semantics of CAS to rest.
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Alex
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> On 27 May 2017, at 18:34, Hans Boehm <boehm at acm.org <mailto:boehm at acm.org>> wrote:
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> This still makes no sense to me. Nobody is suggesting that we remove the volatile read guarantee on failure (unlike the weak... version). If the CAS fails, you are guaranteed to see memory affects that happen before the successful change to z. We're talking about the "volatile write semantics" for the write that didn't happen.
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> This would all be much easier if we had a litmus test (including code snippets for all involved threads) that could distinguish between the two behaviors. I conjecture that all such tests involve potentially infinite loops, and that none of them reflect real programming concerns.
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> I also conjecture that there exists real code that relies on CAS acting as a fence. We should be crystal clear that such code is broken.
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> On Fri, May 26, 2017 at 11:42 PM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>> Integers provide extra structure to plain boolean “failed/succeeded”. Linked data structures with extra dependencies of their contents can also offer extra structure.
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>> if( ! z.CAS(i, j) ) {
>>>>>>>>>>>>>>>>>>>   k = z.get();
>>>>>>>>>>>>>>>>>>>   if(k < j) {
>>>>>>>>>>>>>>>>>>>     // i < k < j
>>>>>>>>>>>>>>>>>>>     // whoever mutated z from i to k, should also negotiate mutation of z from k to j
>>>>>>>>>>>>>>>>>>>     // with someone else, and they should observe whatever stores precede z.CAS
>>>>>>>>>>>>>>>>>>>     // because I won’t contend.
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>     // of course, I need to check they are still at it - but that, too, does not require
>>>>>>>>>>>>>>>>>>>     // stores or CASes
>>>>>>>>>>>>>>>>>>>     ...
>>>>>>>>>>>>>>>>>>>     return;
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>> If whoever mutated z from i to k cannot observe stores that precede z.CAS, they won’t attempt to mutate z to j.
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>> In return can someone explain what the difference is between a weakCompareAndSet failing spuriously and compareAndSet not guaranteeing volatile store semantics on fail? Why should we weaken the promise, if there is already a weak promise to not guarantee visibility on fail?
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>> Alex
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> On 26 May 2017, at 22:35, Hans Boehm <boehm at acm.org <mailto:boehm at acm.org>> wrote:
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> Could we please get an example (i.e. litmus test) of how the "memory effect of at least one volatile ... write" is visible, and where it's useful? Since some people seem really attached to it, it shouldn't be that hard to generate a litmus test.
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> So far we have a claim that it could affect progress guarantees, i.e. whether prior writes eventually become visible without further synchronization. I kind of, sort of, half-way believe that.
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> I haven't been able to make sense out of the subsequent illustration attempts. I really don't think it makes sense to require such weird behavior unless we can at least clearly define exactly what the weird behavior buys us. We really need a concise, or at least precise and understandable, rationale.
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> As has been pointed out before, a volatile write W by T1 to x of the same value that was there before is not easily observable. If I read that value in another thread T2, I can't tell which write I'm seeing, and hence hence a failure to see prior T1 writes is OK; I might have not seen the final write to x. Thus I would need to communicate the  fact that T1 completed W without actually looking at x. That seems to involve another synchronization of T1 with T2, which by itself would ensure the visibility of prior writes to T2.
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> Thus, aside from possible really obscure progress/liveness issues, I really don't see the difference. I think this requirement, if it is indeed not vacuous and completely ignorable, would lengthen the ARMv8 code sequence for a CAS by at least 2 instructions, and introduce a very obscure divergence from C and C++.
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> I'm worried that we're adding something to make RMW operations behave more like fences. They don't, they can't, and they shouldn't.
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> On Fri, May 26, 2017 at 1:08 PM, Nathan and Ila Reynolds <nathanila at gmail.com <mailto:nathanila at gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>>>> > "The memory effects of a write occur regardless of outcome."
>>>>>>>>>>>>>>>>>>>>> > "This method has memory effects of at least one volatile read and write."
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I am not sure what memory effects means.  If this is defined somewhere in the specs, then ignore this since I haven't read JDK 9 specs.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Does memory effects mean the cache line will be switched into the modified state even if an actual write doesn't occur?  Or does memory effects have to do with ordering of memory operations with respect to the method's operation?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> -Nathan
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On 5/26/2017 1:59 PM, Doug Lea wrote:
>>>>>>>>>>>>>>>>>>>>>> On 05/26/2017 12:22 PM, Gil Tene wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Actually this is another case where the Java 9 spec needs to be adjusted…
>>>>>>>>>>>>>>>>>>>>>> The pre-jdk9 method for weak CAS is now available in four
>>>>>>>>>>>>>>>>>>>>>> flavors: weakCompareAndSetPlain, weakCompareAndSet,
>>>>>>>>>>>>>>>>>>>>>> weakCompareAndSetAcquire, weakCompareAndSetRelease.
>>>>>>>>>>>>>>>>>>>>>> They have different read/write access modes. The specs reflect this.
>>>>>>>>>>>>>>>>>>>>>> The one keeping the name weakCompareAndSet is stronger, the others
>>>>>>>>>>>>>>>>>>>>>> weaker than before (this is the only naming scheme that works).
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> About those specs... see JBS JDK-8181104
>>>>>>>>>>>>>>>>>>>>>>    https://bugs.openjdk.java.net/browse/JDK-8181104 <https://bugs.openjdk.java.net/browse/JDK-8181104>
>>>>>>>>>>>>>>>>>>>>>> The plan is for all CAS VarHandle methods to include the sentence
>>>>>>>>>>>>>>>>>>>>>>    "The memory effects of a write occur regardless of outcome."
>>>>>>>>>>>>>>>>>>>>>> And for j.u.c.atomic methods getAndUpdate, updateAndGet,
>>>>>>>>>>>>>>>>>>>>>> getAndAccumulate, accumulateAndGet to include the sentence:
>>>>>>>>>>>>>>>>>>>>>>    "This method has memory effects of at least one volatile read and write."
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Which should clear up confusion.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> -Doug
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>> Concurrency-interest mailing list
>>>>>>>>>>>>>>>>>>>>>> Concurrency-interest at cs.oswego.edu <mailto:Concurrency-interest at cs.oswego.edu>
>>>>>>>>>>>>>>>>>>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest <http://cs.oswego.edu/mailman/listinfo/concurrency-interest> 
>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>> -Nathan
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> Concurrency-interest mailing list
>>>>>>>>>>>>>>>>>>>>> Concurrency-interest at cs.oswego.edu <mailto:Concurrency-interest at cs.oswego.edu>
>>>>>>>>>>>>>>>>>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest <http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> Concurrency-interest mailing list
>>>>>>>>>>>>>>>>>>>> Concurrency-interest at cs.oswego.edu <mailto:Concurrency-interest at cs.oswego.edu>
>>>>>>>>>>>>>>>>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest <http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>>>>>> _______________________________________________
>>>>>>>> Concurrency-interest mailing list
>>>>>>>> Concurrency-interest at cs.oswego.edu <mailto:Concurrency-interest at cs.oswego.edu>
>>>>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest <http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>>>> 
>>>>> 
>>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20170530/4ebd7315/attachment-0001.html>


More information about the Concurrency-interest mailing list