[concurrency-interest] The very best CAS loop

Andrew Haley aph at redhat.com
Wed Sep 28 12:58:45 EDT 2016


On 28/09/16 15:17, Peter Levart wrote:
> If you look at what the GetAndUpdateBench actually does, then you'll see 
> that *all* weak CAS failures (100 %) should be spurious and that there 
> should be *no* strong CAS failures. The benchmark exhibits 2 threads and 
> each of them accesses its own variable.

Yes.  Got that.

> The problem with this benchmark is, I think, that all platforms tried so 
> far either natively implement strong CAS or weak CAS, but not both (I 
> might be wrong, that's just what I have observed so far).

Mine does both.  I know because I implemented these functions!

> The difference you get between dflt/martin and shade/strong are 
> therefore probably just caused by false-sharing / no-false-sharing 
> between the fields of the allocated Vars object.

I don't understand why you say that.  I'm pretty certain that the
false-sharing / no-false-sharing doesn't differ between these.  It's
a real performance difference between the implementations.

> Remember, each of the two threads accesses a different
> field. There's no data race (not strong CAS failures), just
> false-sharing or not resulting in spurious weak CAS failures or
> cache contention with strong CASes or not.

Yes.  Again, we don't differ on this.

> To verify my claims, I replaced VarHandle view over direct ByteBuffer 
> with Unsafe accesses to buffer's native memory in the following benchmark:
> 
> http://cr.openjdk.java.net/~plevart/misc/GetAndUpdate/GetAndUpdateBench3.java
> 
> compiled here:
> 
> http://cr.openjdk.java.net/~plevart/misc/GetAndUpdate/benchmarks.jar
> 
> Can you try it on AArch64. You can replace BlackHole.consumeCPU with you 
> xhorshift loop, but I don't think you'll get much different results.
> 
> On Intel (i7), they are comparable and just about 5% faster than 
> VarHandle variant, so I think even VarHandle variant is dominated by CAS 
> and updateFn and not by indirections or checks performed by VarHandle 
> view over direct ByteBuffer:

Your mailer totally mangled this table.  Please fix your mailer so
that it doesn't wrap outgoing lines.

Results appended.  The different reults for each implementation look
much more similar than when I did the measurement, but there's more
fixed overhead.  The lowest-overhead versions give much the same
results:

GetAndUpdateBench3.dflt                                     1  avgt   20  132.744 ? 2.497  ns/op
GetAndUpdateBench3.martin                                   1  avgt   20  141.096 ? 9.840  ns/op
GetAndUpdateBench3.shade                                    1  avgt   20  132.816 ? 0.809  ns/op
GetAndUpdateBench3.strong                                   1  avgt   20  130.283 ? 3.070  ns/op

Andrew.


Benchmark                                       (updateFnCpu)  Mode  Cnt    Score   Error  Units
GetAndUpdateBench3.dflt                                     1  avgt   20  132.744 ? 2.497  ns/op
GetAndUpdateBench3.dflt:getAndUpdate1_dflt                  1  avgt   20  132.869 ? 2.444  ns/op
GetAndUpdateBench3.dflt:getAndUpdate2_dflt                  1  avgt   20  132.619 ? 2.576  ns/op
GetAndUpdateBench3.dflt                                    10  avgt   20  151.319 ? 3.903  ns/op
GetAndUpdateBench3.dflt:getAndUpdate1_dflt                 10  avgt   20  151.331 ? 3.953  ns/op
GetAndUpdateBench3.dflt:getAndUpdate2_dflt                 10  avgt   20  151.306 ? 3.856  ns/op
GetAndUpdateBench3.dflt                                    20  avgt   20  185.814 ? 1.673  ns/op
GetAndUpdateBench3.dflt:getAndUpdate1_dflt                 20  avgt   20  185.758 ? 1.711  ns/op
GetAndUpdateBench3.dflt:getAndUpdate2_dflt                 20  avgt   20  185.870 ? 1.667  ns/op
GetAndUpdateBench3.dflt                                    50  avgt   20  305.686 ? 1.639  ns/op
GetAndUpdateBench3.dflt:getAndUpdate1_dflt                 50  avgt   20  305.685 ? 1.718  ns/op
GetAndUpdateBench3.dflt:getAndUpdate2_dflt                 50  avgt   20  305.687 ? 1.576  ns/op
GetAndUpdateBench3.dflt                                   100  avgt   20  492.770 ? 1.362  ns/op
GetAndUpdateBench3.dflt:getAndUpdate1_dflt                100  avgt   20  492.933 ? 1.392  ns/op
GetAndUpdateBench3.dflt:getAndUpdate2_dflt                100  avgt   20  492.607 ? 1.540  ns/op
GetAndUpdateBench3.martin                                   1  avgt   20  141.096 ? 9.840  ns/op
GetAndUpdateBench3.martin:getAndUpdate1_martin              1  avgt   20  141.042 ? 9.827  ns/op
GetAndUpdateBench3.martin:getAndUpdate2_martin              1  avgt   20  141.150 ? 9.855  ns/op
GetAndUpdateBench3.martin                                  10  avgt   20  147.714 ? 3.361  ns/op
GetAndUpdateBench3.martin:getAndUpdate1_martin             10  avgt   20  147.723 ? 3.351  ns/op
GetAndUpdateBench3.martin:getAndUpdate2_martin             10  avgt   20  147.705 ? 3.375  ns/op
GetAndUpdateBench3.martin                                  20  avgt   20  185.620 ? 2.114  ns/op
GetAndUpdateBench3.martin:getAndUpdate1_martin             20  avgt   20  185.583 ? 2.151  ns/op
GetAndUpdateBench3.martin:getAndUpdate2_martin             20  avgt   20  185.657 ? 2.082  ns/op
GetAndUpdateBench3.martin                                  50  avgt   20  293.495 ? 1.739  ns/op
GetAndUpdateBench3.martin:getAndUpdate1_martin             50  avgt   20  293.524 ? 1.775  ns/op
GetAndUpdateBench3.martin:getAndUpdate2_martin             50  avgt   20  293.466 ? 1.724  ns/op
GetAndUpdateBench3.martin                                 100  avgt   20  487.543 ? 2.683  ns/op
GetAndUpdateBench3.martin:getAndUpdate1_martin            100  avgt   20  487.456 ? 2.865  ns/op
GetAndUpdateBench3.martin:getAndUpdate2_martin            100  avgt   20  487.629 ? 2.580  ns/op
GetAndUpdateBench3.shade                                    1  avgt   20  132.816 ? 0.809  ns/op
GetAndUpdateBench3.shade:getAndUpdate1_shade                1  avgt   20  132.780 ? 0.839  ns/op
GetAndUpdateBench3.shade:getAndUpdate2_shade                1  avgt   20  132.852 ? 0.796  ns/op
GetAndUpdateBench3.shade                                   10  avgt   20  159.925 ? 9.468  ns/op
GetAndUpdateBench3.shade:getAndUpdate1_shade               10  avgt   20  160.100 ? 9.645  ns/op
GetAndUpdateBench3.shade:getAndUpdate2_shade               10  avgt   20  159.749 ? 9.318  ns/op
GetAndUpdateBench3.shade                                   20  avgt   20  205.441 ? 5.715  ns/op
GetAndUpdateBench3.shade:getAndUpdate1_shade               20  avgt   20  206.646 ? 8.524  ns/op
GetAndUpdateBench3.shade:getAndUpdate2_shade               20  avgt   20  204.235 ? 7.383  ns/op
GetAndUpdateBench3.shade                                   50  avgt   20  314.961 ? 5.568  ns/op
GetAndUpdateBench3.shade:getAndUpdate1_shade               50  avgt   20  314.340 ? 5.699  ns/op
GetAndUpdateBench3.shade:getAndUpdate2_shade               50  avgt   20  315.582 ? 5.878  ns/op
GetAndUpdateBench3.shade                                  100  avgt   20  495.603 ? 4.379  ns/op
GetAndUpdateBench3.shade:getAndUpdate1_shade              100  avgt   20  495.755 ? 4.923  ns/op
GetAndUpdateBench3.shade:getAndUpdate2_shade              100  avgt   20  495.450 ? 3.985  ns/op
GetAndUpdateBench3.strong                                   1  avgt   20  130.283 ? 3.070  ns/op
GetAndUpdateBench3.strong:getAndUpdate1_strong              1  avgt   20  130.259 ? 3.081  ns/op
GetAndUpdateBench3.strong:getAndUpdate2_strong              1  avgt   20  130.308 ? 3.067  ns/op
GetAndUpdateBench3.strong                                  10  avgt   20  146.124 ? 3.120  ns/op
GetAndUpdateBench3.strong:getAndUpdate1_strong             10  avgt   20  146.176 ? 3.064  ns/op
GetAndUpdateBench3.strong:getAndUpdate2_strong             10  avgt   20  146.072 ? 3.192  ns/op
GetAndUpdateBench3.strong                                  20  avgt   20  186.345 ? 1.518  ns/op
GetAndUpdateBench3.strong:getAndUpdate1_strong             20  avgt   20  186.319 ? 1.529  ns/op
GetAndUpdateBench3.strong:getAndUpdate2_strong             20  avgt   20  186.371 ? 1.564  ns/op
GetAndUpdateBench3.strong                                  50  avgt   20  292.982 ? 1.855  ns/op
GetAndUpdateBench3.strong:getAndUpdate1_strong             50  avgt   20  292.991 ? 1.820  ns/op
GetAndUpdateBench3.strong:getAndUpdate2_strong             50  avgt   20  292.973 ? 1.902  ns/op
GetAndUpdateBench3.strong                                 100  avgt   20  480.361 ? 1.097  ns/op
GetAndUpdateBench3.strong:getAndUpdate1_strong            100  avgt   20  480.359 ? 1.189  ns/op
GetAndUpdateBench3.strong:getAndUpdate2_strong            100  avgt   20  480.363 ? 1.248  ns/op


More information about the Concurrency-interest mailing list