[concurrency-interest] synchronized vs Unsafe#monitorEnter/monitorExit

Ben Manes ben_manes at yahoo.com
Sat Dec 27 21:28:20 EST 2014

Thanks, that makes a lot of sense.
Not surprising, but still interesting, is that mixing both usages of the byte code results in only the Unsafe paths having a slow lock acquisition, while the verifiable paths are optimized. I'll probably have to see how heavy an AQS non-reentrant lock is per entry as an alternative approach for my use-case, or simply abandon it altogether.
Benchmark                                                       Mode  Samples         Score         Error  Unitsc.g.b.c.SynchronizedBenchmark.mixed                            thrpt       10  20893653.628 ±   89375.469  ops/sc.g.b.c.SynchronizedBenchmark.mixed:mixed_monitor              thrpt       10    454111.564 ±    7717.129  ops/sc.g.b.c.SynchronizedBenchmark.mixed:mixed_sync                 thrpt       10  20439542.064 ±   93987.110  ops/sc.g.b.c.SynchronizedBenchmark.monitor_contention               thrpt       10   3589347.939 ±  114413.831  ops/sc.g.b.c.SynchronizedBenchmark.monitor_noContention             thrpt       10   7789934.551 ±  424970.084  ops/sc.g.b.c.SynchronizedBenchmark.nonReentrantLock_contention      thrpt       10  35595272.514 ±  219754.617  ops/sc.g.b.c.SynchronizedBenchmark.nonReentrantLock_noContention    thrpt       10  74245220.098 ±  768665.279  ops/sc.g.b.c.SynchronizedBenchmark.reentrantLock_contention         thrpt       10  27296079.389 ±  787027.871  ops/sc.g.b.c.SynchronizedBenchmark.reentrantLock_noContention       thrpt       10  41981666.507 ± 1374677.863  ops/sc.g.b.c.SynchronizedBenchmark.synchronized_contention          thrpt       10  22512475.218 ±  301107.363  ops/sc.g.b.c.SynchronizedBenchmark.synchronized_noContention        thrpt       10  43245916.801 ± 1664731.804  ops/s 

     On Saturday, December 27, 2014 5:10 PM, Gil Tene <gil at azulsystems.com> wrote:

 It's not "synchronized" per se that is responsible for the difference. It's the use of the monitorenter and monitorexit bytecodes. Some of the optimizations done for monitors rely on their verified behavior for correctness. The unsafe versions are not verified to adhere to the same requirements, which either makes some optimizations impossible, or just made the optimization designer not bother trying to optimize the unconfined "could do anything" case.
E.g. the fast, uncontended, unbiased monitor path devolves to fast path CAS on the object header in most JVMs (displaced headers, thin locking, Bacon bits, whatever...). But this common optimization often strongly assumes balanced use of monitors as enforced by the verifier when monitor_enter and monitor_exit byetcodes are used. E.g. HotSpot uses displaced headers for this operation, and stores a displaced mark word on the thread stack, knowing (based on the verified bytecode qualities) that the stack frame will not be rewound before a monitor_exit would occur. Since an unsafe monitor enter call may not have a matching monitor exit in the same frame, that optimization would be invalid to perform.
— Gil.

On Dec 27, 2014, at 12:31 PM, Ben Manes <ben_manes at yahoo.com> wrote:
Can someone explain why using Unsafe's monitor methods are substantially worse than synchronized? I had expected them to emit equivalent monitorEnter/monitorExit instructions and have similar performance.
My use case is to support a bulk version of CHM#computeIfAbsent, where a single mapping function returns the result for computing multiple entries. I had hoped to bulk lock, insert the unfilled entries, compute, populate, and bulk unlock. An overlapping write would be blocked due to requiring an entry's lock for mutation. I had thought that using Unsafe would allow for achieving this without the memory overhead of a ReentrantLock/AQS per entry, since the synchronized keyword is not flexible enough to provide this structure.
Benchmark                                                    Mode  Samples         Score         Error  Unitsc.g.b.c.SynchronizedBenchmark.monitor_contention            thrpt       10   3694951.630 ±   34340.707  ops/sc.g.b.c.SynchronizedBenchmark.monitor_noContention          thrpt       10   8274097.911 ±  164356.363  ops/sc.g.b.c.SynchronizedBenchmark.reentrantLock_contention      thrpt       10  31668532.247 ±  740850.955  ops/sc.g.b.c.SynchronizedBenchmark.reentrantLock_noContention    thrpt       10  41380163.703 ± 2270103.507  ops/sc.g.b.c.SynchronizedBenchmark.synchronized_contention       thrpt       10  22905995.761 ±  117868.968  ops/sc.g.b.c.SynchronizedBenchmark.synchronized_noContention     thrpt       10  44891601.915 ± 1458775.665  ops/s

Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20141228/2d5fa98a/attachment.html>

More information about the Concurrency-interest mailing list