[concurrency-interest] AtomicReferenceArray.get() and intrinsics method inlining

Manuel Dominguez Sarmiento mads at renxo.com
Thu Jan 16 15:28:56 EST 2020


Thanks Aleksey for your feedback. We know about JHM but we have no 
experience with it. We'll definitely use it in the future.
We used Oracle JDK 1.8.0_212 on Mac OS X to produce the reported 
results. Update 212 is from April 2019 so it's not that old anyway.
We'll re-test on the latest 1.8.x JDK and report back.

*Manuel Dominguez Sarmiento*

On 16/01/2020 17:15, Aleksey Shipilev wrote:
> On 1/16/20 8:43 PM, Manuel Dominguez Sarmiento via Concurrency-interest wrote:
>>      private static final long ara(final long repetitions) {
>>          final AtomicReferenceArray<Object> ara = new AtomicReferenceArray<Object>(100);
>>          ara.set(KEY, new Object());
>>          long start = System.nanoTime();
>>          for (long i = 0; i < repetitions; i++) {
>>              ara.get(KEY);
>>          }
>>          long end = System.nanoTime();
>>          return (end - start) / (1000 * 1000);
>>      }
>> }
> For the love of all that is holy, use JMH:
>    https://openjdk.java.net/projects/code-tools/jmh/
>
>> This test took about 40 seconds on the exact same hardware as previous CHM tests. So we concluded
>> that AtomicReferenceArray.get() usage instead of Unsafe.getObjectVolatile() was the cause behind the
>> ConcurrentHashMapV8 EhCache fork being so much slower than Java8 stock ConcurrentHashMap.
>> So this is the interesting bit:
>> sun.misc.Unsafe::getObjectVolatile (0 bytes)   failed to inline (intrinsic)
> Cannot reproduce:
>
> $ java -server -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
> -XX:-TieredCompilation -XX:MaxInlineLevel=15 Test
> java.util.concurrent.atomic.AtomicReferenceArray::byteOffset (12 bytes)
>
>     @ 41   java.util.concurrent.atomic.AtomicReferenceArray::byteOffset (12 bytes)   inline (hot)
>
> java.util.concurrent.atomic.AtomicReferenceArray::getRaw (12 bytes)
>
>     @ 8   sun.misc.Unsafe::getObjectVolatile (0 bytes)   (intrinsic)
>
>     @ 3   java.util.concurrent.atomic.AtomicReferenceArray::checkedByteOffset (45 bytes)   inline
> (hot)
>       @ 41   java.util.concurrent.atomic.AtomicReferenceArray::byteOffset (12 bytes)   inline (hot)
>
>     @ 6   java.util.concurrent.atomic.AtomicReferenceArray::getRaw (12 bytes)   inline (hot)
>
>       @ 8   sun.misc.Unsafe::getObjectVolatile (0 bytes)   (intrinsic)
>
> Test::ara @ 29 (65 bytes)
>
>     @ 38   java.util.concurrent.atomic.AtomicReferenceArray::get (10 bytes)   inline (hot)
>
>       @ 3   java.util.concurrent.atomic.AtomicReferenceArray::checkedByteOffset (45 bytes)   inline
> (hot)
>         @ 41   java.util.concurrent.atomic.AtomicReferenceArray::byteOffset (12 bytes)   inline (hot)
>
>       @ 6   java.util.concurrent.atomic.AtomicReferenceArray::getRaw (12 bytes)   inline (hot)
>
>         @ 8   sun.misc.Unsafe::getObjectVolatile (0 bytes)   (intrinsic)
>
> Test::ara @ -2 (65 bytes)   made not entrant
>
> ara=429 ms
>
>
> $ java -version
> openjdk version "1.8.0_232"
> OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_232-b09)
> OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.232-b09, mixed mode)
>
>
>> After careful studying of stock Java8 ConcurrentHashMap.get(), we found that the reason why that
>>   method was being successfully inlined is the (tab = table) != null check before tabAt() is
>> invoked. Apparently, the HotSpot compiler is unable to inline getObjectVolatile() unless it can
>> verify that its main argument will always be non-null.
> If true, that would qualify as performance bug. Hotspot should be able to inline Unsafe accessors,
> and it would emit runtime null-checks if it is not sure about nullity.
>
> Please try with more up to date JDK binary.
>
>> The ConcurrentHashMapV8 took about 40 seconds on average to complete. However, stock Java 8
>> ConcurrentHashMap took only about 800 ms to complete the same task. Profiling showed that the
>> ConcurrentHashMapV8 fork hotspot was at AtomicReferenceArray.get(), which matched the issue we
>> found in our production systems with very "hot" cache keys.
> The more likely causes would be:
>   a) additional memory dereference every time "table" is accessed. In that frankenstein-monster of
> CHM it would be additional dereference to reach the backing array in ARA itself;
>   b) profiling skew that misattributed the bottleneck to ARA.get();
>   c) some subtle difference between the fork and the "stock" CHM version;
>
> The trouble here is that you have the minimized test that does not show the problem :/ Please
> provide MCVE for the actual problem you are chasing (pull the exact CHM sources into there, if you
> have to), and full details on the environment you run the test in.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20200116/465bf1c3/attachment.htm>


More information about the Concurrency-interest mailing list