[concurrency-interest] AtomicReference.updateAndGet() mandatory updating

David Holmes davidcholmes at aapt.net.au
Sun May 28 18:52:57 EDT 2017



I don’t recall anyone ever questioning what the atomic means in these atomic operations – it is ubiquitous terminology. If the store happens it is because the current value was the expected value. That is indivisible ie atomic. There can be no intervening store. This is either the semantics of the hardware instruction (e.g. cmpxchg) or else must be emulated using whatever is available e.g. ll/sc instructions (where an intervening store, in the strong CAS, must cause a retry).




From: Concurrency-interest [mailto:concurrency-interest-bounces at cs.oswego.edu] On Behalf Of Alex Otenko
Sent: Monday, May 29, 2017 7:40 AM
To: Hans Boehm <boehm at acm.org>
Cc: concurrency-interest at cs.oswego.edu
Subject: Re: [concurrency-interest] AtomicReference.updateAndGet() mandatory updating


Yes, you could read it both ways. You see, lock-based implementations and x86 LOCK:CMPXCHG semantics inspire to interpret the statement such that there is at least some write-like semantics (hence “memory *effects*”) - not necessarily a write to z, but fences or whatever that imitates a volatile write to z from JMM.



The other source of confusion is the claim of atomicity. Is it “atomically (sets the value) (to the given updated value if the current value = the expected value)” or “atomically (sets the value to the given updated value if the current value == the expected value)”? Does atomicity imply it is a single item in total order of all operations? Or all stores? Or just stores to that variable? If you know how it’s implemented, it turns out it is far from atomic.


Does it at least *implement* atomic behaviour, does it *appear* atomic to an observer? For example, if a concurrent store appears between the load and “the store”  (in quotes, because it may not be executed - so in that case it is no longer “between”), do we get synchronizes-with edge with the store that preceded the load, or also the store that intervened? If we don’t get synchronizes-with edge to the store that intervened (which I suspect it doesn’t), then it is not atomic in any of those senses (but x86 and lock-based implementations create false analogies, so we get “atomic” in the method description).



It needs to be specced out, best of all formally in JMM as the source of authority, rather than higher-level API javadocs, spread all over the place.





On 28 May 2017, at 18:30, Hans Boehm <boehm at acm.org <mailto:boehm at acm.org> > wrote:


Thanks. I think I understand now. If Thread 2 returns false, the Thread 2 CAS failed, and the initial CAS in Thread 1 succeeds. Either x immediately reads back as 1 in Thread 1, or we set b to true after Thread 2 returns b. Thus the second (successful) CAS in Thread 1 must follow the unsuccessful Thread 2 CAS in synchronization order. So any write to z by the failed CAS synchronizes with the second successful CAS in Thread 1, and we could thus conclude that x is 1 in the Thread 1 return.


This relies critically on the assumption that the Thread 2 failed CAS has the semantics of a volatile write to z.


I think the actual relevant spec text is:


1) "compareAndSet and all other read-and-update operations such as getAndIncrement have the memory effects of both reading and writing volatile variables."


2) "Atomically sets the value to the given updated value if the current value == the expected value."


I would not read this as guaranteeing that property. But I agree the spec doesn't make much sense; I read (2) as saying there is no write at all if the CAS fails, as I would expect. Thus it seems like a stretch to assume that the write from (1) is to z, though I have no idea what write it would refer to.


The prior implementation discussion now does make sense to me. I don't think this is an issue for lock-based implementations. But the only reasonable way to support it on ARMv8 seems to be with a conditionally executed fence in the failing case. That adds two instructions, as well as a large amount of time overhead for algorithms that don't retry on a strong CAS. My impression is that those algorithms are frequent enough to be a concern.



On Sat, May 27, 2017 at 4:49 PM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com> > wrote:

That’s right.


Atomicity (for some definition of atomicity - ie atomic with respect to which operations) is not needed here. As long as the store in CAS occurs always, x=1 is not “reordered” (certainly, not entirely - can’t escape the “store” that is declared in the spec).




On 28 May 2017, at 00:43, Hans Boehm <boehm at acm.org <mailto:boehm at acm.org> > wrote:


I gather the interesting scenario here is the one in which the Thread 2 CAS fails and Thread 2 returns false, while the initial Thread 1 CAS succeeds?


The correctness argument here relies on the fact that the load of x in Thread 1 must, in this scenario, see the store of x in Thread 2? This assumes the load of z in the failing CAS in Thread 2 can't be reordered with the ordinary (and racey!) store to x by the same thread. I agree that the j.u.c.atomic spec was not clear in this respect, but I don't think it was ever the intent to guarantee that. It's certainly false for either a lock-based or ARMv8 implementation of CAS. Requiring it would raise serious questions about practical implementability on several architectures.


The C++ standard is quite clear that this is not required; atomicity means only that the load of a RMW operation sees the immediately prior write in the coherence order for that location. It doesn't guarantee anything about other accesses somehow appearing to be performed in the middle of the operation. It's completely analogous to the kind of atomicity you get in a lock-based implementation.


On Sat, May 27, 2017 at 3:26 PM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com> > wrote:

Not sure what you mean by “acting as a fence” being broken.


There’s probably even more code that relies on atomicity of CAS - that is, when the write happened on successful CAS, it happened atomically with the read; it constitutes a single operation in the total order of all volatile stores.



int x=0; // non-volatile

volatile int z=0;

volatile boolean b=false;



if (CAS(z, 0, 1)) {

  if (x == 0) {


    CAS(z, 1, 2);



return x;




if (!CAS(z, 0, 2)) {

  return b;


return true;


In essence, if CAS failure is caused by a real mismatch of z (not a spurious failure), then we can guarantee there is a return 1 or a further CAS in the future from the point of the first successful CAS (by program order), and we can get a witness b whether that CAS is in the future from the point of the failing CAS (by total order of operations).


If failing CAS in Thread2 does not have store semantics, then nothing in Thread1 synchronizes-with it, and Thread1 is not guaranteed to return 1 even if Thread2 returns false.


If failing CAS in Thread2 does have store semantics, then if Thread2 returns false, Thread1 returns 1.



Not sure what you mean by “real programming concerns”. It sounds a bit like “true Scotsman”. The concern I am trying to convey, is that Java 8 semantics offer a very strong CAS that can be used to enforce mutual exclusion using a single CAS call, and that this can be combined with inductive types to produce strong guarantees of correctness. Having set the field right, I can make sure most contenders execute less than a single CAS after mutation. Sounds real enough concern to me :)



Anyhow, I also appreciate that most designs do not look that deep into the spec, and won’t notice the meaning getting closer to the actual hardware trends. If Java 8 CAS semantics gets deprecated, the algorithm will become obsolete, and will need modification with extra fences in the proprietary code that needs it, or whatever is not broken in the new JMM that will lay the memory semantics of CAS to rest.





On 27 May 2017, at 18:34, Hans Boehm <boehm at acm.org <mailto:boehm at acm.org> > wrote:


This still makes no sense to me. Nobody is suggesting that we remove the volatile read guarantee on failure (unlike the weak... version). If the CAS fails, you are guaranteed to see memory affects that happen before the successful change to z. We're talking about the "volatile write semantics" for the write that didn't happen.


This would all be much easier if we had a litmus test (including code snippets for all involved threads) that could distinguish between the two behaviors. I conjecture that all such tests involve potentially infinite loops, and that none of them reflect real programming concerns.


I also conjecture that there exists real code that relies on CAS acting as a fence. We should be crystal clear that such code is broken.


On Fri, May 26, 2017 at 11:42 PM, Alex Otenko <oleksandr.otenko at gmail.com <mailto:oleksandr.otenko at gmail.com> > wrote:

Integers provide extra structure to plain boolean “failed/succeeded”. Linked data structures with extra dependencies of their contents can also offer extra structure.


if( ! z.CAS(i, j) ) {

  k = z.get();

  if(k < j) {

    // i < k < j

    // whoever mutated z from i to k, should also negotiate mutation of z from k to j

    // with someone else, and they should observe whatever stores precede z.CAS

    // because I won’t contend.


    // of course, I need to check they are still at it - but that, too, does not require

    // stores or CASes






If whoever mutated z from i to k cannot observe stores that precede z.CAS, they won’t attempt to mutate z to j.



In return can someone explain what the difference is between a weakCompareAndSet failing spuriously and compareAndSet not guaranteeing volatile store semantics on fail? Why should we weaken the promise, if there is already a weak promise to not guarantee visibility on fail?






On 26 May 2017, at 22:35, Hans Boehm <boehm at acm.org <mailto:boehm at acm.org> > wrote:


Could we please get an example (i.e. litmus test) of how the "memory effect of at least one volatile ... write" is visible, and where it's useful? Since some people seem really attached to it, it shouldn't be that hard to generate a litmus test.


So far we have a claim that it could affect progress guarantees, i.e. whether prior writes eventually become visible without further synchronization. I kind of, sort of, half-way believe that.


I haven't been able to make sense out of the subsequent illustration attempts. I really don't think it makes sense to require such weird behavior unless we can at least clearly define exactly what the weird behavior buys us. We really need a concise, or at least precise and understandable, rationale.


As has been pointed out before, a volatile write W by T1 to x of the same value that was there before is not easily observable. If I read that value in another thread T2, I can't tell which write I'm seeing, and hence hence a failure to see prior T1 writes is OK; I might have not seen the final write to x. Thus I would need to communicate the  fact that T1 completed W without actually looking at x. That seems to involve another synchronization of T1 with T2, which by itself would ensure the visibility of prior writes to T2.


Thus, aside from possible really obscure progress/liveness issues, I really don't see the difference. I think this requirement, if it is indeed not vacuous and completely ignorable, would lengthen the ARMv8 code sequence for a CAS by at least 2 instructions, and introduce a very obscure divergence from C and C++.


I'm worried that we're adding something to make RMW operations behave more like fences. They don't, they can't, and they shouldn't.


On Fri, May 26, 2017 at 1:08 PM, Nathan and Ila Reynolds <nathanila at gmail.com <mailto:nathanila at gmail.com> > wrote:

> "The memory effects of a write occur regardless of outcome."
> "This method has memory effects of at least one volatile read and write."

I am not sure what memory effects means.  If this is defined somewhere in the specs, then ignore this since I haven't read JDK 9 specs.

Does memory effects mean the cache line will be switched into the modified state even if an actual write doesn't occur?  Or does memory effects have to do with ordering of memory operations with respect to the method's operation?


On 5/26/2017 1:59 PM, Doug Lea wrote:

On 05/26/2017 12:22 PM, Gil Tene wrote:

Actually this is another case where the Java 9 spec needs to be adjusted…

The pre-jdk9 method for weak CAS is now available in four
flavors: weakCompareAndSetPlain, weakCompareAndSet,
weakCompareAndSetAcquire, weakCompareAndSetRelease.
They have different read/write access modes. The specs reflect this.
The one keeping the name weakCompareAndSet is stronger, the others
weaker than before (this is the only naming scheme that works).

About those specs... see JBS JDK-8181104
The plan is for all CAS VarHandle methods to include the sentence
   "The memory effects of a write occur regardless of outcome."
And for j.u.c.atomic methods getAndUpdate, updateAndGet,
getAndAccumulate, accumulateAndGet to include the sentence:
   "This method has memory effects of at least one volatile read and write."

Which should clear up confusion.


Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu <mailto:Concurrency-interest at cs.oswego.edu> 



Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu <mailto:Concurrency-interest at cs.oswego.edu> 


Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu <mailto:Concurrency-interest at cs.oswego.edu> 








-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20170529/338a388c/attachment-0001.html>

More information about the Concurrency-interest mailing list