[concurrency-interest] AtomicReference.updateAndGet() mandatory updating

Gil Tene gil at azul.com
Sat May 27 11:14:55 EDT 2017



Sent from my iPad

On May 26, 2017, at 11:19 AM, Hans Boehm <boehm at acm.org<mailto:boehm at acm.org>> wrote:

Someone from ARM should chime in here, but my understanding is that ARMv8 acquire/release loads and stores are designed to exactly model C++ memory_order_seq_cst (NOT memory_order_acquire/memory_order_release) loads and stores. They do NOT imply fences. They are not intended to implement fences. They should not be used to implement fences. The architecture still has fences, so there is no need to.

I see my mistake. My transitive interpretation omitted the requirement for an observer of the store-release and load-aquire.  Since the ordering guarantees of the load-acquire and Store-release statements in the ARM spec involve only the observers of the load-acquire or store-release in question, the processor is allowed to e.g. optimize them away in cases where it can prove that no such observer exists.

E.g. In a practical/possible way, in the following sequence:

  r1 = a
  b = r2
store release r3 to v
r4 = load-acquire from v
  r5 = c
  d = r6
v = r7

The processor, knowing that v will be stomped with the value of r7, and that it is possible that no observer would see the store release of r3 to v before that stomp happens, could ensure that no observer would see the store-release by (somehow) folding the two stores. Similarly, since it is also possible that no observer would interfere with the value of v between the store release and the load acquire, it can fold that away, assign the value of r4 to r3, and eliminate the load-aquire operation. Altogether it validly transform the sequence to and execute it as:

r4 = r3
 d = r6
 r5 = c
 b = r2
 r1 = a
v = r7

eliminating all ordering implication of the load-acquire or store-release, since both can be prevented from ever having any observers,


For example,

r1 = x;
store release to v;
r2 = y;

Does not order the accesses to x and y any more than a C++ sequentially consistent store would order relaxed accesses to x and y.

Atomic RMW operations implemented with ARM acquire/release primitives have roughly the memory ordering semantics of a RMW operation implemented with a lock. They are NOT fences, should NOT be used to implement fences, etc. For example,

r1 = x;
x.lock();
...
x.unlock();
r2 = y;

does NOT order the accesses to x and y, since both can move into the critical section and pass each other. The same applies to ARMv8 RMW operations.

My reading of the spec is that a sequentially consistent store followed by a sequentially consistent load is still not sufficient to generate the equivalent of a fence. (I would guess that on current hardware it probably is, but I don't know.) If there are no observers of the release store, it promises essentially no ordering. There is no good reason to that anyway.

AFAICT, the discussion about atomic RMW as fence replacement is entirely x86-specific. I'm not sure, but it seems to be caused by the fact that an x86 MFENCE makes all sorts of other guarantees about write-coalescing memory, etc., that we don't really care about. The RMW operations do not, and are thus often faster. My guess is that the problem originates from the fact that x86 doesn't have a suitably plain vanilla fence instruction.

I'm not sure how this interacts with the original discussion. There's still the interesting question of whether a volatile write that doesn't change the value of an object is observable.

On Fri, May 26, 2017 at 9:43 AM, Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>> wrote:
On 26/05/17 17:09, Gil Tene wrote:
> loads or stores that appear in program order before the store-release"
>
> So ***for ARMv8*** a store-release followed by a load-aquire (e.g. both the a thread local) will impose a StoreLoad order.
>
> [This is not a general property of store-release and load-aquire]

That's right.  By the way, the memory model for ARM has been rewritten,
and the engineer who wrote it promises me absolutely and truly that the
instructions are sequentially consistent, and were always intended to be.

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com<https://www.redhat.com/>>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20170527/7ec9bf02/attachment-0001.html>


More information about the Concurrency-interest mailing list