[concurrency-interest] Unsafe.getAndAddLong

Nathan Reynolds nathan.reynolds at oracle.com
Thu May 22 12:44:07 EDT 2014


A CAS retry can be very costly.  See this blog entry for the details. 
https://blogs.oracle.com/dave/entry/atomic_fetch_and_add_vs

Basically, as CAS retries happen more often the branch predictor will 
predict that the loop will be re-executed.  So, when the CAS finally 
succeeds, the processor will stall because the branch was predicted the 
wrong way.  If CAS was used to acquire a lock, then the critical region 
of the lock is now longer and throughput will suffer.

Word tearing is probably not very common.  But, an already hot CAS loop 
wouldn't appreciate any more problems.

-Nathan

On 5/22/2014 9:20 AM, Andrew Haley wrote:
> I don't think that word-tearing matters here: all it will do in the
> odd chance that it occurs is cause a retry.
>
> On 05/22/2014 03:29 PM, Arcadiy Ivanov wrote:
>> JLS 17.4.7
>>
>>   1.
>>
>>      Each read sees a write to the same variable in the execution.
>>
>>      All reads and writes of volatile variables are volatile actions. For
>>      all reads/r/in/A/, we have/W(r)/in/A/and/W(r).v/=/r.v/. The
>>      variable/r.v/is volatile if and only if/r/is a volatile read, and
>>      the variable/w.v/is volatile if and only if/w/is a volatile write.
>>
>> If you have a volatile write but no volatile read, there is no guarantee
>> that there won't be a reordering occurring. As such, the compiler may
>> optimize the first read away and you'll be left with a value that has
>> been read quite a long time ago earlier potentially all the way to the
>> previous CAS.
>>
>> I would add another clause:
>> (3) Ensuring that the first read of v is not reordered.
>>
>> But the worst case scenario you would have is that you'll have a failing
>> CAS the first time so both reordering and stale cache are technically
>> covered under my (1).
> This argument is a strange combination of pragmatic efficiency
> concerns and abstract language semantics.  No-one is disputing
> correctness, only efficiency.  And we can't talk about efficiency in
> terms of abstract languages.  So, let's get real.
>
> Here is a real machine with weakly-ordered memory, in this case ARMv8:
>
>    0x0000007fa11df1b4: ldr	x12, [x10,#16]
>    0x0000007fa11df1b8: dmb	ishld           ;*invokevirtual getLongVolatile
>                                                  ; - sun.misc.Unsafe::getAndAddLong at 3 (line 1050)
>                                                  ; - java.util.concurrent.atomic.AtomicLong::incrementAndGet at 8 (line 200)
>                                                  ; - Test1::run at 4 (line 6)
>
>    0x0000007fa11df1bc: dmb	ish             ;*invokevirtual compareAndSwapLong
>                                                  ; - sun.misc.Unsafe::getAndAddLong at 18 (line 1051)
>                                                  ; - java.util.concurrent.atomic.AtomicLong::incrementAndGet at 8 (line 200)
>                                                  ; - Test1::run at 4 (line 6)
>
>    0x0000007fa11df1c0: add	x13, x12, #0x1  ;*ladd
>                                                  ; - sun.misc.Unsafe::getAndAddLong at 17 (line 1051)
>                                                  ; - java.util.concurrent.atomic.AtomicLong::incrementAndGet at 8 (line 200)
>                                                  ; - Test1::run at 4 (line 6)
>
>    0x0000007fa11df1c4: ldar	xscratch1, [x11]
>    0x0000007fa11df1c8: cmp	xscratch1, x12
>    0x0000007fa11df1cc: b.ne	0x0000007fa11df1d8
>    0x0000007fa11df1d0: stlxr	wscratch1, x13, [x11]
>    ...
>
> Note that the LoadLoad|LoadStore barrier comes *after* the read of
> getLongVolatile, so it can have no effect whatsoever on when v is
> read.  It does not force a fresh copy of v to be read.  That value of
> v could be very stale indeed.  I will grant you that it will, at
> least, be read from memory, but that is no guarantee of any
> timeliness.
>
> I suppose that in theory you could be running on a machine where
> volatile loads are implemented by issuing a StoreLoad barrier before
> each volatile load instead of after each volatile store, but AFAIK
> no-one does this.
>
> I suggest that using getLongVolatile() does not help on any machine and
> makes things worse on weakly-ordered machines.
>
> Andrew.
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20140522/ba1eece5/attachment.html>


More information about the Concurrency-interest mailing list