[concurrency-interest] synchronized vs Unsafe#monitorEnter/monitorExit

Gil Tene gil at azulsystems.com
Tue Dec 30 20:44:57 EST 2014


The number of objects being synchronized on doesn't really have much of an effect. There aren't that many JVMs out there, and AFAIK all the ones that run on servers will handle high counts of synchronized-on objects just fine. There are plenty of commonly used pieces of software that put JVMs through these paces every day. For example, Cassandra (under load) will tend to generate 10s of thousands of new *inflated* monitors per second. And I've seen several other apps that do that too.

JVMs (at least the ones with fixed sized object headers, which is most server JVMs) tend to have room in all object headers for the "un-contended"/deflated monitor state case, both locked and unlocked. The way they implement this varies, e.g. displaced headers in Oracle HotSpot/OpenJDK vs. thin locks or bacon bits in others, but they all fit that stuff in a single header word somehow.

It's "inflation" and "deflation" that are the actual cost, and the choices there vary. E.g. it is possible to contain the number of inflated monitors to the number of threads. Each thread can at most be blocked on one monitor at a time, so the maximum number of actual required monitors is always capped by the number of threads. It is also possible to incorporate the inflated monitor state into the thread structures based on this observation (chaining stuff through threads). But in practice HotSpot variants (including Zing) tend to maintain an inflated monitor pool. When monitors need to be inflated (due to contention or wait() calls) an inflated monitor state is allocated and connected to the object (usually by having the header refer to it somehow). This obviously means that monitors need to be deflated to balance this out. Monitor deflation is usually done as a STW (stop the world) operation as part of safepoint operations (time is usually hidden in GC pauses, and not explicitly reported). Deflation can be done concurrently without STW effects, but AFAIK Zing is the only one doing it concurrently right now. Makes quite a difference when you need to deflate at a sustained rate of 30K monitors/sec and don't want to take pauses in the process...

— Gil.

On Dec 30, 2014, at 4:29 PM, Oleksandr Otenko <oleksandr.otenko at oracle.com<mailto:oleksandr.otenko at oracle.com>> wrote:

ok. You must consider that using the object header means different JVMs will have different overheads and limits. Some JVMs will not let you have tens of millions of objects on which you want to synchronize - contended or not - just because there is not enough space in their object headers for all the book-keeping info.

Alex


On 31/12/2014 00:12, Ben Manes wrote:
> Why do you need synchronized specifically? (and not j.u.c.locks.Lock lock/unlock)

To minimize memory usage, as the lock is per entry. Consider this as an extension to ConcurrentHashMap's computeIfAbsent(K key), but that instead accepts Iterable<K> keys. A full j.u.c lock is ideal, but that requires each entry embeds a lock instance and that adds significant bloat. In the case of a cache, writes are rare and even rarer to the same entries. Thus, synchronizing should use the Object's header as a think lock until contention is detected and the JVM inflates the lock to a full mutex. After inflation, the JVM could decide to deinflate the lock if it reverts back to low contention usages. That means no wasted memory to support a single use-case, which may not be a very commonly used pattern. If the Unsafe versions were optimizable, then using synchronized would be an ideal choice.

There are alternatives, of course. The approach in Guava was to be non-blocking and populate the cache after the bulk load completes. This overwrites any entries that appeared while the loadAll was computed, so it doesn't solve the thundering herd problem. This is reasonable based on the assumption that local cache misses don't cause the as much pain as, say, a memcached restart causing a database query storm. Another approach might use lock striping, e.g. 1024 real locks that can be hashed to and are acquired in ascending order. That adds more complexity and hash collisions might occur, but it wouldn't be too bad in practice. It does add extra complexity. And yet another might be a lazy lock field per entry, but that is still wasteful in large caches.

There's probably a solution using some type of reservation entry that contains a lock reference, and after populated transitions to an implementation that doesn't have a lock. That can be done within the hash table, like CHM's ReservationNode, but is harder to do elegantly through a decorator.

Anyway, hopefully its clear now why Unsafe appeared like an attractive solution if it had performed well.


On Tuesday, December 30, 2014 6:54 AM, Oleksandr Otenko <oleksandr.otenko at oracle.com><mailto:oleksandr.otenko at oracle.com> wrote:


Why do you need synchronized specifically? (and not j.u.c.locks.Lock lock/unlock)

Alex

On 28/12/2014 20:15, Ben Manes wrote:
That's an nifty workaround but, as you said, not reliable in the case of a general api. This case is a multi-get for a cache library, so the keys and use-cases aren't known.

When adding bulk loading to Guava, due to the internal complexity we allowed it to racy by not blocking concurrent calls for shared entries. This has to be done regardless because a loadAll(keys) may return a Map<K, V> with more entries than originally requested, and those extra entries should be cached. These cases were trivial to handle when I had previously written a Map<K, Future<V>> to support multi-get, but a lot of bloat per entry. I had hoped Unsafe#monitorEnter would be a nice alternative and, if the API was stable, it arguably still could be due to the low write rate of caches. For now I'll leave it in a backlog of performance optimization ideas like we did with Guava, maybe revisiting once the rewrite stabilizes.

Thanks,
-Ben


On Sunday, December 28, 2014 5:18 AM, Peter Levart <peter.levart at gmail.com><mailto:peter.levart at gmail.com> wrote:



On 12/27/2014 09:31 PM, Ben Manes wrote:
Can someone explain why using Unsafe's monitor methods are substantially worse than synchronized? I had expected them to emit equivalent monitorEnter/monitorExit instructions and have similar performance.

My use case is to support a bulk version of CHM#computeIfAbsent, where a single mapping function returns the result for computing multiple entries. I had hoped to bulk lock, insert the unfilled entries, compute, populate, and bulk unlock. An overlapping write would be blocked due to requiring an entry's lock for mutation.

Hi Ben,

If "multiple entries" means less than say 50 or even a little more (using more locks than that for one bulk operation is not very optimal anyway), you could try constructing a recursive method to bulk-lock on the list of objects and execute a provided function:


    public static <T> T synchronizedAll(Iterable<?> locks, Supplier<T> supplier) {
        return synchronizedAll(locks.iterator(), supplier);
    }

    private static <T> T synchronizedAll(Iterator<?> locksIterator, Supplier<T> supplier) {
        if (locksIterator.hasNext()) {
            synchronized (locksIterator.next()) {
                return synchronizedAll(locksIterator, supplier);
            }
        } else {
            return supplier.get();
        }
    }



Note that to avoid deadlocks, the locks list has to be sorted using a Comparator that is consistent with your key's equals() method...

Regards, Peter



I had thought that using Unsafe would allow for achieving this without the memory overhead of a ReentrantLock/AQS per entry, since the synchronized keyword is not flexible enough to provide this structure.

Thanks,
Ben

Benchmark                                                    Mode  Samples         Score         Error  Units
c.g.b.c.SynchronizedBenchmark.monitor_contention            thrpt       10   3694951.630 ±   34340.707  ops/s
c.g.b.c.SynchronizedBenchmark.monitor_noContention          thrpt       10   8274097.911 ±  164356.363  ops/s
c.g.b.c.SynchronizedBenchmark.reentrantLock_contention      thrpt       10  31668532.247 ±  740850.955  ops/s
c.g.b.c.SynchronizedBenchmark.reentrantLock_noContention    thrpt       10  41380163.703 ± 2270103.507  ops/s
c.g.b.c.SynchronizedBenchmark.synchronized_contention       thrpt       10  22905995.761 ±  117868.968  ops/s
c.g.b.c.SynchronizedBenchmark.synchronized_noContention     thrpt       10  44891601.915 ± 1458775.665  ops/s




_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest






_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest







_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20141231/d5cca52f/attachment-0001.html>


More information about the Concurrency-interest mailing list