[concurrency-interest] The very best CAS loop

Vitaly Davidovich vitalyd at gmail.com
Wed Sep 28 10:39:31 EDT 2016


On Wednesday, September 28, 2016, Peter Levart <peter.levart at gmail.com>
wrote:

>
>
> On 09/28/2016 03:08 PM, Andrew Haley wrote:
>
> Looking at the profile data, "spurious" CAS failures are pretty
> rare.  I do not believe that this is simply because we "got lucky" and
> the variables were on different cache lines, but it's possible.  One
> one run I saw times spectacularly better than these, and I ignored it.
> I added some padding to prevent the two variables sharing the same
> cache line, and:
>
> Benchmark                                       Mode  Cnt   Score   Error  Units
> GetAndUpdateBench0.dflt                         avgt   20  77.116 ? 0.120  ns/op
> GetAndUpdateBench0.dflt:getAndUpdate1_dflt      avgt   20  77.142 ? 0.159  ns/op
> GetAndUpdateBench0.dflt:getAndUpdate2_dflt      avgt   20  77.090 ? 0.112  ns/op
> GetAndUpdateBench0.martin                       avgt   20  76.865 ? 0.222  ns/op
> GetAndUpdateBench0.martin:getAndUpdate1_martin  avgt   20  76.840 ? 0.260  ns/op
> GetAndUpdateBench0.martin:getAndUpdate2_martin  avgt   20  76.889 ? 0.241  ns/op
> GetAndUpdateBench0.shade                        avgt   20  76.324 ? 0.096  ns/op
> GetAndUpdateBench0.shade:getAndUpdate1_shade    avgt   20  76.286 ? 0.015  ns/op
> GetAndUpdateBench0.shade:getAndUpdate2_shade    avgt   20  76.362 ? 0.183  ns/op
> GetAndUpdateBench0.strong                       avgt   20  76.303 ? 0.080  ns/op
> GetAndUpdateBench0.strong:getAndUpdate1_strong  avgt   20  76.330 ? 0.154  ns/op
> GetAndUpdateBench0.strong:getAndUpdate2_strong  avgt   20  76.277 ? 0.011  ns/op
>
> QED, I think.
>
>
> Yes, but that's an entirely different benchmark. This just tests raw speed
> of a single thread in its own cache-line. GetAndUpdateBench was meant to
> test the effect of false-sharing on different chosen strategy (mainly
> martin vs. shade). But there seems to be no measurable difference so far.
> GetAndUpdateBench failed because it did not guarantee false-sharing. Thus
> native ByteBuffer variant GetAndUpdateBench2,3 was made where the test has
> control over alignment.
>
A better way to test the different versions is to run jmh under perf and
look at PMU counters.  In addition, probably makes sense to use a multi
socket machine and spread the threads across them to see worst case of
cachelines
bouncing across the sockets and not within a single socket via L2 or LLC.

>
>
> Regards, Peter
>
>


-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20160928/0e9eb237/attachment.html>


More information about the Concurrency-interest mailing list