[concurrency-interest] The very best CAS loop
vitalyd at gmail.com
Wed Sep 28 10:39:31 EDT 2016
On Wednesday, September 28, 2016, Peter Levart <peter.levart at gmail.com>
> On 09/28/2016 03:08 PM, Andrew Haley wrote:
> Looking at the profile data, "spurious" CAS failures are pretty
> rare. I do not believe that this is simply because we "got lucky" and
> the variables were on different cache lines, but it's possible. One
> one run I saw times spectacularly better than these, and I ignored it.
> I added some padding to prevent the two variables sharing the same
> cache line, and:
> Benchmark Mode Cnt Score Error Units
> GetAndUpdateBench0.dflt avgt 20 77.116 ? 0.120 ns/op
> GetAndUpdateBench0.dflt:getAndUpdate1_dflt avgt 20 77.142 ? 0.159 ns/op
> GetAndUpdateBench0.dflt:getAndUpdate2_dflt avgt 20 77.090 ? 0.112 ns/op
> GetAndUpdateBench0.martin avgt 20 76.865 ? 0.222 ns/op
> GetAndUpdateBench0.martin:getAndUpdate1_martin avgt 20 76.840 ? 0.260 ns/op
> GetAndUpdateBench0.martin:getAndUpdate2_martin avgt 20 76.889 ? 0.241 ns/op
> GetAndUpdateBench0.shade avgt 20 76.324 ? 0.096 ns/op
> GetAndUpdateBench0.shade:getAndUpdate1_shade avgt 20 76.286 ? 0.015 ns/op
> GetAndUpdateBench0.shade:getAndUpdate2_shade avgt 20 76.362 ? 0.183 ns/op
> GetAndUpdateBench0.strong avgt 20 76.303 ? 0.080 ns/op
> GetAndUpdateBench0.strong:getAndUpdate1_strong avgt 20 76.330 ? 0.154 ns/op
> GetAndUpdateBench0.strong:getAndUpdate2_strong avgt 20 76.277 ? 0.011 ns/op
> QED, I think.
> Yes, but that's an entirely different benchmark. This just tests raw speed
> of a single thread in its own cache-line. GetAndUpdateBench was meant to
> test the effect of false-sharing on different chosen strategy (mainly
> martin vs. shade). But there seems to be no measurable difference so far.
> GetAndUpdateBench failed because it did not guarantee false-sharing. Thus
> native ByteBuffer variant GetAndUpdateBench2,3 was made where the test has
> control over alignment.
A better way to test the different versions is to run jmh under perf and
look at PMU counters. In addition, probably makes sense to use a multi
socket machine and spread the threads across them to see worst case of
bouncing across the sockets and not within a single socket via L2 or LLC.
> Regards, Peter
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest