[concurrency-interest] synchronized vs Unsafe#monitorEnter/monitorExit
ben_manes at yahoo.com
Sat Dec 27 21:28:20 EST 2014
Thanks, that makes a lot of sense.
Not surprising, but still interesting, is that mixing both usages of the byte code results in only the Unsafe paths having a slow lock acquisition, while the verifiable paths are optimized. I'll probably have to see how heavy an AQS non-reentrant lock is per entry as an alternative approach for my use-case, or simply abandon it altogether.
Benchmark Mode Samples Score Error Unitsc.g.b.c.SynchronizedBenchmark.mixed thrpt 10 20893653.628 ± 89375.469 ops/sc.g.b.c.SynchronizedBenchmark.mixed:mixed_monitor thrpt 10 454111.564 ± 7717.129 ops/sc.g.b.c.SynchronizedBenchmark.mixed:mixed_sync thrpt 10 20439542.064 ± 93987.110 ops/sc.g.b.c.SynchronizedBenchmark.monitor_contention thrpt 10 3589347.939 ± 114413.831 ops/sc.g.b.c.SynchronizedBenchmark.monitor_noContention thrpt 10 7789934.551 ± 424970.084 ops/sc.g.b.c.SynchronizedBenchmark.nonReentrantLock_contention thrpt 10 35595272.514 ± 219754.617 ops/sc.g.b.c.SynchronizedBenchmark.nonReentrantLock_noContention thrpt 10 74245220.098 ± 768665.279 ops/sc.g.b.c.SynchronizedBenchmark.reentrantLock_contention thrpt 10 27296079.389 ± 787027.871 ops/sc.g.b.c.SynchronizedBenchmark.reentrantLock_noContention thrpt 10 41981666.507 ± 1374677.863 ops/sc.g.b.c.SynchronizedBenchmark.synchronized_contention thrpt 10 22512475.218 ± 301107.363 ops/sc.g.b.c.SynchronizedBenchmark.synchronized_noContention thrpt 10 43245916.801 ± 1664731.804 ops/s
On Saturday, December 27, 2014 5:10 PM, Gil Tene <gil at azulsystems.com> wrote:
It's not "synchronized" per se that is responsible for the difference. It's the use of the monitorenter and monitorexit bytecodes. Some of the optimizations done for monitors rely on their verified behavior for correctness. The unsafe versions are not verified to adhere to the same requirements, which either makes some optimizations impossible, or just made the optimization designer not bother trying to optimize the unconfined "could do anything" case.
E.g. the fast, uncontended, unbiased monitor path devolves to fast path CAS on the object header in most JVMs (displaced headers, thin locking, Bacon bits, whatever...). But this common optimization often strongly assumes balanced use of monitors as enforced by the verifier when monitor_enter and monitor_exit byetcodes are used. E.g. HotSpot uses displaced headers for this operation, and stores a displaced mark word on the thread stack, knowing (based on the verified bytecode qualities) that the stack frame will not be rewound before a monitor_exit would occur. Since an unsafe monitor enter call may not have a matching monitor exit in the same frame, that optimization would be invalid to perform.
On Dec 27, 2014, at 12:31 PM, Ben Manes <ben_manes at yahoo.com> wrote:
Can someone explain why using Unsafe's monitor methods are substantially worse than synchronized? I had expected them to emit equivalent monitorEnter/monitorExit instructions and have similar performance.
My use case is to support a bulk version of CHM#computeIfAbsent, where a single mapping function returns the result for computing multiple entries. I had hoped to bulk lock, insert the unfilled entries, compute, populate, and bulk unlock. An overlapping write would be blocked due to requiring an entry's lock for mutation. I had thought that using Unsafe would allow for achieving this without the memory overhead of a ReentrantLock/AQS per entry, since the synchronized keyword is not flexible enough to provide this structure.
Benchmark Mode Samples Score Error Unitsc.g.b.c.SynchronizedBenchmark.monitor_contention thrpt 10 3694951.630 ± 34340.707 ops/sc.g.b.c.SynchronizedBenchmark.monitor_noContention thrpt 10 8274097.911 ± 164356.363 ops/sc.g.b.c.SynchronizedBenchmark.reentrantLock_contention thrpt 10 31668532.247 ± 740850.955 ops/sc.g.b.c.SynchronizedBenchmark.reentrantLock_noContention thrpt 10 41380163.703 ± 2270103.507 ops/sc.g.b.c.SynchronizedBenchmark.synchronized_contention thrpt 10 22905995.761 ± 117868.968 ops/sc.g.b.c.SynchronizedBenchmark.synchronized_noContention thrpt 10 44891601.915 ± 1458775.665 ops/s
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest