[concurrency-interest] JDK 9's compareAndSet vs compareAndExchange

Vitaly Davidovich vitalyd at gmail.com
Thu Sep 22 10:06:41 EDT 2016


On Thu, Sep 22, 2016 at 9:35 AM, Dávid Karnok <akarnokd at gmail.com> wrote:

> JDK 9's VarHandle API (and AtomicXXX classes) specify the new
>  compareAndExchange() method. Traditionally, I wrote CAS loops like this:
>
> public static long getAddAndCap(AtomicLong requested, long n) {
>     for (;;) {
>         long current = requested.get();
>
>         if (current == Long.MAX_VALUE) {
>            return Long.MAX_VALUE;
>         }
>         long next = current + n;
>         if (next < 0L) {
>             next = Long.MAX_VALUE;
>         }
>         if (requested.compareAndSet(current, next)) {
>             return current;
>         }
>     }
> }
>
> Now I can write this:
>
> public static long getAddAndCap(AtomicLong requested, long n) {
>     long current = requested.get();
>     for (;;) {
>
>         if (current == Long.MAX_VALUE) {
>            return Long.MAX_VALUE;
>         }
>         long next = current + n;
>         if (next < 0L) {
>             next = Long.MAX_VALUE;
>         }
>         long actual = requested.compareAndExchange(current, next);
>         if (actual == current) {
>            return current;
>         }
>         current = actual;
>     }
> }
>
> I'm not sure I could JMH benchmark these under JDK 9 now so my question is
> whether the latter pattern has lower overhead (on x86) since it reuses the
> value returned by the underlying LOCK CMPXCHG instead of re-reading the
> target field again (or the JIT was always smart enough to detect the
> unnecessary re-read with compareAndSet; or the hardware is smart enough by
> doing speculative reads in this case to hide its latency ?).
>
I don't see how the JIT could've avoided the "re-read" since the semantics
are different; you're basically asking whether it pattern matched a
CAS+get(), assumed the intention was CAE, and fused that as CAE
internally.  I'm pretty sure the answer is no.

It's very likely the subsequent get() is a L1 hitting load (unless there's
very serious contention and the cacheline is invalidated again), so it
shouldn't be too bad.  However, additional atomic ops per loop iteration
will serve as barriers for JIT optimizations, which may play some role in
effective performance.  Although in your particular example above, if
CAS/CAE fails multiple times, your performance will likely be dominated by
the associated coherence traffic.

>
> --
> Best regards,
> David Karnok
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20160922/606d01bf/attachment-0001.html>


More information about the Concurrency-interest mailing list