[concurrency-interest] Double Checked Locking in OpenJDK

Boehm, Hans hans.boehm at hp.com
Fri Aug 17 16:31:02 EDT 2012

I agree with the conclusion about manual cache management, but not the Itanium analogy.  We really don't want to discuss Itanium, especially not at this time, on this mailing list.


From: Vitaly Davidovich [mailto:vitalyd at gmail.com]
Sent: Friday, August 17, 2012 1:01 PM
To: Ruslan Cheremin
Cc: Ulrich Grepel; Boehm, Hans; concurrency-interest at cs.oswego.edu
Subject: Re: [concurrency-interest] Double Checked Locking in OpenJDK

I really don't see this happening (i.e. h/w manufacturers releasing incoherent/manually coherent memory subsystems).  Putting technical difficulty imposed on developing new software for such a thing, it's impractical to think that existing software will be rewritten to work on it.  Intel Itanium is a good example - nice ideas there for getting great performance on it, but it put the burden on compiler writers to get it and it proved too difficult in practice, and the platform is abandoned.

Sent from my phone
On Aug 17, 2012 2:08 PM, "Ruslan Cheremin" <cheremin at gmail.com<mailto:cheremin at gmail.com>> wrote:
Yes, Ulrich, I have grid-like systems in my mind then talking about
perspective of weakening hardware coherence.

But, by any way, one does not need to look so far. As I've already
write, we already have some kind of
weak-consistent-not-automatically-coherent memory in todays Indel CPUs
-- in form of registers and store buffers. This is small layer atop of
coherent memory, but this layer is, as far as I know, critical for
overall performance, since it is important in hiding (well, sometimes
hiding) still noticeable memory latency. Not only main memory (or,
say, L3/2 cache latency), but also a QPI latency, if accessed memory
location is owned by another core, and need to be re-owned, for

I see no reason why evolution of QPI will be somehow different from
evolution of memory itself. Leaving away chance for some kind of
hardware revolution (breakthrough, which would give us cheap and
ultimate fast memory/QPI), it seems for me like we'll have same
QPI-wall, as we've already have memory wall. I see no chance for QPI
being fast, wide, cheap, and scale to hundreds of CPUs same time. So
we'll still need some kind of weak consistent layer with explicit
flushing control to hide weakness of memory (and QPI, as part of
memory engine).

What I trying to say here: seems like we always will have strict
consistent but rather slow memory (with QPI), and quick but weak
consistent memory. Border between them could move -- now days servers
and desktops have tiny weak-consistent layer, while grids and clusters
have all its memory "weak consistent" (only explicitly synchronized).

And if my assumption are not too far from reality, it seems promising
(or at least interesting) to trying to investigate algorithms which
can exploit inconsistency, instead of trying to fight with it with
fences. I see much analogies with distributed systems here, there
"eventually consistent" design becoming de-facto standard today (the
CAP theorem Ulrich mention). Talking about race-based algorithms I
have this kind of design in my mind.

Do you know about any works in this direction? For now I see only one
promising example for exploiting eventually consistent approach -- the
sync-less cache for atomically published entities, like primitives
(except long/double sure) or immutable objects.

2012/8/17 Ulrich Grepel <uli at grepel.de<mailto:uli at grepel.de>>:
> On 17.08.2012 02:01, concurrency-interest-request at cs.oswego.edu<mailto:concurrency-interest-request at cs.oswego.edu> wrote:
>> Date: Thu, 16 Aug 2012 20:00:57 -0400
>> From: Vitaly Davidovich <vitalyd at gmail.com<mailto:vitalyd at gmail.com>>
>> NUMA and faster interconnects between the nodes (e.g Intel's QPI) seem to
>> be hw manufacturers' strategy at the moment.  I have a hard time imagining
>> that someone like Intel will ditch cache coherence to support hugely
>> parallel machines - seems like they'll need to continue finding
>> new/improved ways to scale within it.  Will definitely be interesting to
>> see ...
> Large scale parallel systems can be found in the supercomputing field. The
> single most important hardware issue there is interconnect speed and the
> single most important software issue is to try to minimize communication
> between all those threads.
> If you've got hundreds of thousands or even millions of cores, all with
> local RAM, there's just no way to quickly synchronize caches or even RAM, so
> you won't have a synchronized memory model there, at least not across all
> cores. So with massive parallelism, the problem is will remain: syncing is
> expensive, so avoid it.
> Transferring this into the Java world of the future we either hit a hardware
> wall - more cores aren't providing any additional performance - or a
> software wall - we need some extended NUMA concepts for Java threads. And
> this in a language which is somewhat still supposed to fulfill the "write
> once, run everywhere" paradigm.
> What I could imagine for example is something akin to Thread Local Storage,
> but on a Cache Coherency Group ("CCG") level, something like "get me the
> CCG's instance of the cache". And some guarantee that, if desired, a thread
> remains in the CCG of the thread that started the second thread.
> Syncing the various CCGs is a challenge however. If you're updating one of
> the CCG caches and want to propagate this to the other CCGs, you might run
> into the same kind of problems that distributed databases run into. See the
> CAP theorem - you can have two out of the following three: Consistency,
> Availability and Partition Tolerance.
> Uli
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu<mailto:Concurrency-interest at cs.oswego.edu>
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120817/3c06ac50/attachment-0001.html>

More information about the Concurrency-interest mailing list