[concurrency-interest] The very best CAS loop

Peter Levart peter.levart at gmail.com
Mon Sep 26 19:32:54 EDT 2016


Hi,

I don't know about you, but on my i7 PC, I get consistently better 
results using strong CAS as opposed to weak CAS for the following benchmark:

http://cr.openjdk.java.net/~plevart/misc/GetAndUpdate/GetAndUpdateBench.java

The results:

Benchmark                                      Mode  Cnt Score   Error  
Units
GetAndUpdateBench.dflt                         avgt   20 53.947 ± 0.770  
ns/op
GetAndUpdateBench.dflt:getAndUpdate1_dflt      avgt   20 54.103 ± 1.083  
ns/op
GetAndUpdateBench.dflt:getAndUpdate2_dflt      avgt   20 53.791 ± 1.014  
ns/op
GetAndUpdateBench.martin                       avgt   20 52.724 ± 0.218  
ns/op
GetAndUpdateBench.martin:getAndUpdate1_martin  avgt   20 52.379 ± 0.972  
ns/op
GetAndUpdateBench.martin:getAndUpdate2_martin  avgt   20 53.070 ± 1.005  
ns/op
GetAndUpdateBench.shade                        avgt   20 53.437 ± 0.624  
ns/op
GetAndUpdateBench.shade:getAndUpdate1_shade    avgt   20 53.401 ± 0.615  
ns/op
GetAndUpdateBench.shade:getAndUpdate2_shade    avgt   20 53.474 ± 0.652  
ns/op
GetAndUpdateBench.strong                       avgt   20 38.090 ± 0.727  
ns/op
GetAndUpdateBench.strong:getAndUpdate1_strong  avgt   20 37.784 ± 1.633  
ns/op
GetAndUpdateBench.strong:getAndUpdate2_strong  avgt   20 38.397 ± 1.691  
ns/op


The benchmark exhibits 2 threads that don't have data contention (each 
of them is modifying its own variable), but most probably those threads 
do modify the same cache line. The update function does have a non-zero 
cost.

To complement above results, I also ran the same code on Raspbery PI 2 
with Oracle build of JDK 9 (b137):

Benchmark                                      Mode  Cnt Score   Error  
Units
GetAndUpdateBench.dflt                         avgt   20 2347.811 ± 
2.492  ns/op
GetAndUpdateBench.dflt:getAndUpdate1_dflt      avgt   20 2347.328 ± 
6.697  ns/op
GetAndUpdateBench.dflt:getAndUpdate2_dflt      avgt   20 2348.294 ± 
7.052  ns/op
GetAndUpdateBench.martin                       avgt   20 2342.773 ± 
5.618  ns/op
GetAndUpdateBench.martin:getAndUpdate1_martin  avgt   20 2342.398 ± 
8.662  ns/op
GetAndUpdateBench.martin:getAndUpdate2_martin  avgt   20 2343.148 ± 
9.514  ns/op
GetAndUpdateBench.shade                        avgt   20 1844.426 ± 
4.304  ns/op
GetAndUpdateBench.shade:getAndUpdate1_shade    avgt   20 1841.747 ± 
5.287  ns/op
GetAndUpdateBench.shade:getAndUpdate2_shade    avgt   20 1847.105 ± 
6.416  ns/op
GetAndUpdateBench.strong                       avgt   20 1846.413 ± 
1.155  ns/op
GetAndUpdateBench.strong:getAndUpdate1_strong  avgt   20 1846.312 ± 
8.095  ns/op
GetAndUpdateBench.strong:getAndUpdate2_strong  avgt   20 1846.513 ± 
8.110  ns/op


Here we may be observing a strange phenomena. Reexecuting the 
non-zero-cost update function on spurious failure (shade) surprisingly 
gives better throughput than immediately retrying weak CAS (dflt, martin).

Also, wasn't strong CAS on ARM supposed to be mapped to a weak CAS loop? 
Why the difference between dflt/martin and strong then?

You've got something to study now... ;-)

Regards, Peter



On 09/26/2016 07:51 PM, Martin Buchholz wrote:
>
>
> On Mon, Sep 26, 2016 at 10:36 AM, Paul Sandoz <paul.sandoz at oracle.com 
> <mailto:paul.sandoz at oracle.com>> wrote:
>
>
>>     On 26 Sep 2016, at 10:01, Martin Buchholz <martinrb at google.com
>>     <mailto:martinrb at google.com>> wrote:
>>
>>     Hi Aleksey,
>>
>>     My goals were to
>>     - have the shortest code path if the CAS succeeds the first time,
>>     which is hopefully the common case
>>     - never call the update function multiple times in the case of
>>     spurious failure, only for real contention.  We don't know how
>>     expensive the update function is, and the number of times it is
>>     called is user-detectable.
>>     - yes, I have sort-of reconstructed the retry loop that strong
>>     CAS does with LL/SC, BUT we can save the value obtained from
>>     get(), which does not happen in the emulation below.  We don't
>>     have multiple value return in java!
>>
>>         public final V getAndUpdate(UnaryOperator<V> updateFunction) {
>>             for (;;) {
>>                 V prev = get();
>>                 V next = updateFunction.apply(prev);
>>                 if (strongCas(prev, next))
>>                     return prev;
>>             }
>>         }
>>         /** Implement strong CAS using weak CAS. */
>>         boolean strongCas(V expectedValue, V newValue) {
>>             while (get() == expectedValue) {
>>                 if (weakCompareAndSetVolatile(expectedValue, newValue))
>>                     return true;
>>             }
>>             return false;
>>         }
>>
>
>     If you are gonna do why not just call “compareAndSet" instead of
>     emulating with your own “strongCas” method? and then, i presume,
>     we are back to where we started.
>
>
> That code was just demonstrating how the return value from get() was 
> getting lost, whereas if you inline a weak cas loop as I have done, 
> you save the extra call to get().
>
>     I don’t think the “updateFunction” can tell a spurious failure
>     from another failure given the variable, under contention, can
>     change from and back to the previous between the “strongCas” call.
>
>
> I'm imagining user code that counts all the calls to getAndUpdate, and 
> all the calls to updateFunction.  If they differ, I would like that to 
> be evidence of true contention.   True, we tell users updateFunction 
> is supposed to be side-effect-free.  OTOH, we say that attempted 
> updates may fail due to "contention among threads", suggesting we 
> don't call updateFunction spuriously.
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20160927/0a7aff50/attachment-0001.html>


More information about the Concurrency-interest mailing list