[concurrency-interest] AtomicReferenceArray.get() and intrinsics method inlining

Aleksey Shipilev shade at redhat.com
Thu Jan 16 15:15:23 EST 2020


On 1/16/20 8:43 PM, Manuel Dominguez Sarmiento via Concurrency-interest wrote:
>     private static final long ara(final long repetitions) {
>         final AtomicReferenceArray<Object> ara = new AtomicReferenceArray<Object>(100);
>         ara.set(KEY, new Object());
>         long start = System.nanoTime();
>         for (long i = 0; i < repetitions; i++) {
>             ara.get(KEY);
>         }
>         long end = System.nanoTime();
>         return (end - start) / (1000 * 1000);
>     }
> }

For the love of all that is holy, use JMH:
  https://openjdk.java.net/projects/code-tools/jmh/

> This test took about 40 seconds on the exact same hardware as previous CHM tests. So we concluded
> that AtomicReferenceArray.get() usage instead of Unsafe.getObjectVolatile() was the cause behind the
> ConcurrentHashMapV8 EhCache fork being so much slower than Java8 stock ConcurrentHashMap.

> So this is the interesting bit:
> sun.misc.Unsafe::getObjectVolatile (0 bytes)   failed to inline (intrinsic)

Cannot reproduce:

$ java -server -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
-XX:-TieredCompilation -XX:MaxInlineLevel=15 Test
java.util.concurrent.atomic.AtomicReferenceArray::byteOffset (12 bytes)

   @ 41   java.util.concurrent.atomic.AtomicReferenceArray::byteOffset (12 bytes)   inline (hot)

java.util.concurrent.atomic.AtomicReferenceArray::getRaw (12 bytes)

   @ 8   sun.misc.Unsafe::getObjectVolatile (0 bytes)   (intrinsic)

   @ 3   java.util.concurrent.atomic.AtomicReferenceArray::checkedByteOffset (45 bytes)   inline
(hot)
     @ 41   java.util.concurrent.atomic.AtomicReferenceArray::byteOffset (12 bytes)   inline (hot)

   @ 6   java.util.concurrent.atomic.AtomicReferenceArray::getRaw (12 bytes)   inline (hot)

     @ 8   sun.misc.Unsafe::getObjectVolatile (0 bytes)   (intrinsic)

Test::ara @ 29 (65 bytes)

   @ 38   java.util.concurrent.atomic.AtomicReferenceArray::get (10 bytes)   inline (hot)

     @ 3   java.util.concurrent.atomic.AtomicReferenceArray::checkedByteOffset (45 bytes)   inline
(hot)
       @ 41   java.util.concurrent.atomic.AtomicReferenceArray::byteOffset (12 bytes)   inline (hot)

     @ 6   java.util.concurrent.atomic.AtomicReferenceArray::getRaw (12 bytes)   inline (hot)

       @ 8   sun.misc.Unsafe::getObjectVolatile (0 bytes)   (intrinsic)

Test::ara @ -2 (65 bytes)   made not entrant

ara=429 ms


$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.232-b09, mixed mode)


> After careful studying of stock Java8 ConcurrentHashMap.get(), we found that the reason why that
>  method was being successfully inlined is the (tab = table) != null check before tabAt() is 
> invoked. Apparently, the HotSpot compiler is unable to inline getObjectVolatile() unless it can 
> verify that its main argument will always be non-null.
If true, that would qualify as performance bug. Hotspot should be able to inline Unsafe accessors,
and it would emit runtime null-checks if it is not sure about nullity.

Please try with more up to date JDK binary.

> The ConcurrentHashMapV8 took about 40 seconds on average to complete. However, stock Java 8 
> ConcurrentHashMap took only about 800 ms to complete the same task. Profiling showed that the 
> ConcurrentHashMapV8 fork hotspot was at AtomicReferenceArray.get(), which matched the issue we 
> found in our production systems with very "hot" cache keys.
The more likely causes would be:
 a) additional memory dereference every time "table" is accessed. In that frankenstein-monster of
CHM it would be additional dereference to reach the backing array in ARA itself;
 b) profiling skew that misattributed the bottleneck to ARA.get();
 c) some subtle difference between the fork and the "stock" CHM version;

The trouble here is that you have the minimized test that does not show the problem :/ Please
provide MCVE for the actual problem you are chasing (pull the exact CHM sources into there, if you
have to), and full details on the environment you run the test in.

-- 
Thanks,
-Aleksey



More information about the Concurrency-interest mailing list