[concurrency-interest] Double Checked Locking in OpenJDK

Gregg Wonderly gergg at cox.net
Thu Aug 16 11:22:15 EDT 2012


On Aug 15, 2012, at 11:44 PM, "Boehm, Hans" <hans.boehm at hp.com> wrote:

> The standard reference for the JMM problems is Sevcik and Aspinall, "On validity of program transformations in the Java Memory Model", ECOOP 2008.  I became convinced in a java memory model discussion a couple of years ago or so (also including Jaroslav Sevcik) that that's not the only problem, but it is a serious problem.
> 
> I'm unconvinced that removing the synchronization to prevent data races generally improves scalability in the normal sense.  Improving or removing coarser-grained synchronization clearly can.  On x86, fences in particular seem to be executed entirely locally; they add significant overhead at low core counts.  Based on limited experiments, they become increasingly insignificant as you reach other scaling limits, e.g. memory bandwidth.  This is consistent with the usual intuition that fences wait for store buffers to drain.  Even more limited experiments on POWER were, to my surprise, also largely consistent with that.  For a recent experiment I ran in support of a workshop submission, this even tends to be true for acquiring locks solely to avoid data races.  The difference between synchronized and racy performance DECREASED dramatically with the number of threads/cores.
> 
> This doesn't mean that synchronization to avoid data races is entirely free, but only that it's cost doesn't increase, and often decreases with scale.  Doug may well have run more careful experiments around this.

This just seems to be a visible result of latency being hidden by parallelization.  In network applications, this is of course the way that throughput improves.  But, ultimately, it does not change the performance of per item processing.  Any latency in the flow of execution, ultimately dictates what the throughput can be.

The number of cores is not infinite and so the "cost doesn't increase" seems a bit of a stretch.  It doesn't increase until the number of cores and the latency reach an imbalance that raises the latency back into view.

Gregg Wonderly


More information about the Concurrency-interest mailing list