[concurrency-interest] synchronized vs Unsafe#monitorEnter/monitorExit

Oleksandr Otenko oleksandr.otenko at oracle.com
Wed Dec 31 09:36:55 EST 2014


This is great summary.

Yes, I probably stumbled on the inflated locks once. It is not the 
number of locks in use at any time that mattered, but the number of 
locks in use in a certain generation that mattered - so they didn't get 
deflated until the GC for that generation would fire. In the end I 
probably ran out of inflated monitors in the pool. (tens of millions, I 
think)

No, it wasn't HotSpot, if you are wondering.

Alex

On 31/12/2014 01:44, Gil Tene wrote:
> The number of objects being synchronized on doesn't really have much 
> of an effect. There aren't that many JVMs out there, and AFAIK all the 
> ones that run on servers will handle high counts of synchronized-on 
> objects just fine. There are plenty of commonly used pieces of 
> software that put JVMs through these paces every day. For example, 
> Cassandra (under load) will tend to generate 10s of thousands of new 
> *inflated* monitors per second. And I've seen several other apps that 
> do that too.
>
> JVMs (at least the ones with fixed sized object headers, which is most 
> server JVMs) tend to have room in all object headers for the 
> "un-contended"/deflated monitor state case, both locked and unlocked. 
> The way they implement this varies, e.g. displaced headers in Oracle 
> HotSpot/OpenJDK vs. thin locks or bacon bits in others, but they all 
> fit that stuff in a single header word somehow.
>
> It's "inflation" and "deflation" that are the actual cost, and the 
> choices there vary. E.g. it is possible to contain the number of 
> inflated monitors to the number of threads. Each thread can at most be 
> blocked on one monitor at a time, so the maximum number of actual 
> required monitors is always capped by the number of threads. It is 
> also possible to incorporate the inflated monitor state into the 
> thread structures based on this observation (chaining stuff through 
> threads). But in practice HotSpot variants (including Zing) tend to 
> maintain an inflated monitor pool. When monitors need to be inflated 
> (due to contention or wait() calls) an inflated monitor state is 
> allocated and connected to the object (usually by having the header 
> refer to it somehow). This obviously means that monitors need to be 
> deflated to balance this out. Monitor deflation is usually done as a 
> STW (stop the world) operation as part of safepoint operations (time 
> is usually hidden in GC pauses, and not explicitly reported). 
> Deflation can be done concurrently without STW effects, but AFAIK Zing 
> is the only one doing it concurrently right now. Makes quite a 
> difference when you need to deflate at a sustained rate of 30K 
> monitors/sec and don't want to take pauses in the process...
>
> — Gil.
>
>> On Dec 30, 2014, at 4:29 PM, Oleksandr Otenko 
>> <oleksandr.otenko at oracle.com <mailto:oleksandr.otenko at oracle.com>> wrote:
>>
>> ok. You must consider that using the object header means different 
>> JVMs will have different overheads and limits. Some JVMs will not let 
>> you have tens of millions of objects on which you want to synchronize 
>> - contended or not - just because there is not enough space in their 
>> object headers for all the book-keeping info.
>>
>> Alex
>>
>>
>> On 31/12/2014 00:12, Ben Manes wrote:
>>> > Why do you need synchronized specifically? (and not 
>>> j.u.c.locks.Lock lock/unlock)
>>>
>>> To minimize memory usage, as the lock is per entry. Consider this as 
>>> an extension to ConcurrentHashMap's computeIfAbsent(K key), but that 
>>> instead accepts Iterable<K> keys. A full j.u.c lock is ideal, but 
>>> that requires each entry embeds a lock instance and that adds 
>>> significant bloat. In the case of a cache, writes are rare and even 
>>> rarer to the same entries. Thus, synchronizing should use the 
>>> Object's header as a think lock until contention is detected and the 
>>> JVM inflates the lock to a full mutex. After inflation, the JVM 
>>> could decide to deinflate the lock if it reverts back to low 
>>> contention usages. That means no wasted memory to support a single 
>>> use-case, which may not be a very commonly used pattern. If the 
>>> Unsafe versions were optimizable, then using synchronized would be 
>>> an ideal choice.
>>>
>>> There are alternatives, of course. The approach in Guava was to be 
>>> non-blocking and populate the cache after the bulk load completes. 
>>> This overwrites any entries that appeared while the loadAll was 
>>> computed, so it doesn't solve the thundering herd problem. This is 
>>> reasonable based on the assumption that local cache misses don't 
>>> cause the as much pain as, say, a memcached restart causing a 
>>> database query storm. Another approach might use lock striping, e.g. 
>>> 1024 real locks that can be hashed to and are acquired in ascending 
>>> order. That adds more complexity and hash collisions might occur, 
>>> but it wouldn't be too bad in practice. It does add extra 
>>> complexity. And yet another might be a lazy lock field per entry, 
>>> but that is still wasteful in large caches.
>>>
>>> There's probably a solution using some type of reservation entry 
>>> that contains a lock reference, and after populated transitions to 
>>> an implementation that doesn't have a lock. That can be done within 
>>> the hash table, like CHM's ReservationNode, but is harder to do 
>>> elegantly through a decorator.
>>>
>>> Anyway, hopefully its clear now why Unsafe appeared like an 
>>> attractive solution if it had performed well.
>>>
>>>
>>> On Tuesday, December 30, 2014 6:54 AM, Oleksandr Otenko 
>>> <oleksandr.otenko at oracle.com> wrote:
>>>
>>>
>>> Why do you need synchronized specifically? (and not j.u.c.locks.Lock 
>>> lock/unlock)
>>>
>>> Alex
>>>
>>> On 28/12/2014 20:15, Ben Manes wrote:
>>>> That's an nifty workaround but, as you said, not reliable in the 
>>>> case of a general api. This case is a multi-get for a cache 
>>>> library, so the keys and use-cases aren't known.
>>>>
>>>> When adding bulk loading to Guava, due to the internal complexity 
>>>> we allowed it to racy by not blocking concurrent calls for shared 
>>>> entries. This has to be done regardless because a loadAll(keys) may 
>>>> return a Map<K, V> with more entries than originally requested, and 
>>>> those extra entries should be cached. These cases were trivial to 
>>>> handle when I had previously written a Map<K, Future<V>> to support 
>>>> multi-get, but a lot of bloat per entry. I had hoped 
>>>> Unsafe#monitorEnter would be a nice alternative and, if the API was 
>>>> stable, it arguably still could be due to the low write rate of 
>>>> caches. For now I'll leave it in a backlog of performance 
>>>> optimization ideas like we did with Guava, maybe revisiting once 
>>>> the rewrite stabilizes.
>>>>
>>>> Thanks,
>>>> -Ben
>>>>
>>>>
>>>> On Sunday, December 28, 2014 5:18 AM, Peter Levart 
>>>> <peter.levart at gmail.com> <mailto:peter.levart at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/27/2014 09:31 PM, Ben Manes wrote:
>>>>> Can someone explain why using Unsafe's monitor methods are 
>>>>> substantially worse than synchronized? I had expected them to emit 
>>>>> equivalent monitorEnter/monitorExit instructions and have similar 
>>>>> performance.
>>>>>
>>>>> My use case is to support a bulk version of CHM#computeIfAbsent, 
>>>>> where a single mapping function returns the result for computing 
>>>>> multiple entries. I had hoped to bulk lock, insert the unfilled 
>>>>> entries, compute, populate, and bulk unlock. An overlapping write 
>>>>> would be blocked due to requiring an entry's lock for mutation.
>>>>
>>>> Hi Ben,
>>>>
>>>> If "multiple entries" means less than say 50 or even a little more 
>>>> (using more locks than that for one bulk operation is not very 
>>>> optimal anyway), you could try constructing a recursive method to 
>>>> bulk-lock on the list of objects and execute a provided function:
>>>>
>>>>
>>>>     public static <T> T synchronizedAll(Iterable<?> locks, 
>>>> Supplier<T> supplier) {
>>>>         return synchronizedAll(locks.iterator(), supplier);
>>>>     }
>>>>
>>>>     private static <T> T synchronizedAll(Iterator<?> locksIterator, 
>>>> Supplier<T> supplier) {
>>>>         if (locksIterator.hasNext()) {
>>>>             synchronized (locksIterator.next()) {
>>>>                 return synchronizedAll(locksIterator, supplier);
>>>>             }
>>>>         } else {
>>>>             return supplier.get();
>>>>         }
>>>>     }
>>>>
>>>>
>>>>
>>>> Note that to avoid deadlocks, the locks list has to be sorted using 
>>>> a Comparator that is consistent with your key's equals() method...
>>>>
>>>> Regards, Peter
>>>>
>>>>
>>>>
>>>>> I had thought that using Unsafe would allow for achieving this 
>>>>> without the memory overhead of a ReentrantLock/AQS per entry, 
>>>>> since the synchronized keyword is not flexible enough to provide 
>>>>> this structure.
>>>>>
>>>>> Thanks,
>>>>> Ben
>>>>>
>>>>> Benchmark  Mode  Samples       Score     Error  Units
>>>>> c.g.b.c.SynchronizedBenchmark.monitor_contention            thrpt 
>>>>>       10 3694951.630 ± 34340.707  ops/s
>>>>> c.g.b.c.SynchronizedBenchmark.monitor_noContention          thrpt 
>>>>>     10 8274097.911 ±  164356.363  ops/s
>>>>> c.g.b.c.SynchronizedBenchmark.reentrantLock_contention      thrpt 
>>>>> 10  31668532.247 ±  740850.955  ops/s
>>>>> c.g.b.c.SynchronizedBenchmark.reentrantLock_noContention    thrpt 
>>>>> 10  41380163.703 ± 2270103.507  ops/s
>>>>> c.g.b.c.SynchronizedBenchmark.synchronized_contention       thrpt 
>>>>>   10  22905995.761 ±  117868.968  ops/s
>>>>> c.g.b.c.SynchronizedBenchmark.synchronized_noContention     thrpt 
>>>>> 10  44891601.915 ± 1458775.665  ops/s
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Concurrency-interest mailing list
>>>>> Concurrency-interest at cs.oswego.edu  <mailto:Concurrency-interest at cs.oswego.edu>
>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest at cs.oswego.edu  <mailto:Concurrency-interest at cs.oswego.edu>
>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu 
>> <mailto:Concurrency-interest at cs.oswego.edu>
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20141231/58d86e60/attachment-0001.html>


More information about the Concurrency-interest mailing list