[concurrency-interest] Double Checked Locking in OpenJDK

Ruslan Cheremin cheremin at gmail.com
Fri Aug 17 14:08:31 EDT 2012

Yes, Ulrich, I have grid-like systems in my mind then talking about
perspective of weakening hardware coherence.

But, by any way, one does not need to look so far. As I've already
write, we already have some kind of
weak-consistent-not-automatically-coherent memory in todays Indel CPUs
-- in form of registers and store buffers. This is small layer atop of
coherent memory, but this layer is, as far as I know, critical for
overall performance, since it is important in hiding (well, sometimes
hiding) still noticeable memory latency. Not only main memory (or,
say, L3/2 cache latency), but also a QPI latency, if accessed memory
location is owned by another core, and need to be re-owned, for

I see no reason why evolution of QPI will be somehow different from
evolution of memory itself. Leaving away chance for some kind of
hardware revolution (breakthrough, which would give us cheap and
ultimate fast memory/QPI), it seems for me like we'll have same
QPI-wall, as we've already have memory wall. I see no chance for QPI
being fast, wide, cheap, and scale to hundreds of CPUs same time. So
we'll still need some kind of weak consistent layer with explicit
flushing control to hide weakness of memory (and QPI, as part of
memory engine).

What I trying to say here: seems like we always will have strict
consistent but rather slow memory (with QPI), and quick but weak
consistent memory. Border between them could move -- now days servers
and desktops have tiny weak-consistent layer, while grids and clusters
have all its memory "weak consistent" (only explicitly synchronized).

And if my assumption are not too far from reality, it seems promising
(or at least interesting) to trying to investigate algorithms which
can exploit inconsistency, instead of trying to fight with it with
fences. I see much analogies with distributed systems here, there
"eventually consistent" design becoming de-facto standard today (the
CAP theorem Ulrich mention). Talking about race-based algorithms I
have this kind of design in my mind.

Do you know about any works in this direction? For now I see only one
promising example for exploiting eventually consistent approach -- the
sync-less cache for atomically published entities, like primitives
(except long/double sure) or immutable objects.

2012/8/17 Ulrich Grepel <uli at grepel.de>:
> On 17.08.2012 02:01, concurrency-interest-request at cs.oswego.edu wrote:
>> Date: Thu, 16 Aug 2012 20:00:57 -0400
>> From: Vitaly Davidovich <vitalyd at gmail.com>
>> NUMA and faster interconnects between the nodes (e.g Intel's QPI) seem to
>> be hw manufacturers' strategy at the moment.  I have a hard time imagining
>> that someone like Intel will ditch cache coherence to support hugely
>> parallel machines - seems like they'll need to continue finding
>> new/improved ways to scale within it.  Will definitely be interesting to
>> see ...
> Large scale parallel systems can be found in the supercomputing field. The
> single most important hardware issue there is interconnect speed and the
> single most important software issue is to try to minimize communication
> between all those threads.
> If you've got hundreds of thousands or even millions of cores, all with
> local RAM, there's just no way to quickly synchronize caches or even RAM, so
> you won't have a synchronized memory model there, at least not across all
> cores. So with massive parallelism, the problem is will remain: syncing is
> expensive, so avoid it.
> Transferring this into the Java world of the future we either hit a hardware
> wall - more cores aren't providing any additional performance - or a
> software wall - we need some extended NUMA concepts for Java threads. And
> this in a language which is somewhat still supposed to fulfill the "write
> once, run everywhere" paradigm.
> What I could imagine for example is something akin to Thread Local Storage,
> but on a Cache Coherency Group ("CCG") level, something like "get me the
> CCG's instance of the cache". And some guarantee that, if desired, a thread
> remains in the CCG of the thread that started the second thread.
> Syncing the various CCGs is a challenge however. If you're updating one of
> the CCG caches and want to propagate this to the other CCGs, you might run
> into the same kind of problems that distributed databases run into. See the
> CAP theorem - you can have two out of the following three: Consistency,
> Availability and Partition Tolerance.
> Uli
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

More information about the Concurrency-interest mailing list