[concurrency-interest] Double Checked Locking in OpenJDK

Vitaly Davidovich vitalyd at gmail.com
Fri Aug 17 16:00:48 EDT 2012


I really don't see this happening (i.e. h/w manufacturers releasing
incoherent/manually coherent memory subsystems).  Putting technical
difficulty imposed on developing new software for such a thing, it's
impractical to think that existing software will be rewritten to work on
it.  Intel Itanium is a good example - nice ideas there for getting great
performance on it, but it put the burden on compiler writers to get it and
it proved too difficult in practice, and the platform is abandoned.

Sent from my phone
On Aug 17, 2012 2:08 PM, "Ruslan Cheremin" <cheremin at gmail.com> wrote:

> Yes, Ulrich, I have grid-like systems in my mind then talking about
> perspective of weakening hardware coherence.
>
> But, by any way, one does not need to look so far. As I've already
> write, we already have some kind of
> weak-consistent-not-automatically-coherent memory in todays Indel CPUs
> -- in form of registers and store buffers. This is small layer atop of
> coherent memory, but this layer is, as far as I know, critical for
> overall performance, since it is important in hiding (well, sometimes
> hiding) still noticeable memory latency. Not only main memory (or,
> say, L3/2 cache latency), but also a QPI latency, if accessed memory
> location is owned by another core, and need to be re-owned, for
> example.
>
> I see no reason why evolution of QPI will be somehow different from
> evolution of memory itself. Leaving away chance for some kind of
> hardware revolution (breakthrough, which would give us cheap and
> ultimate fast memory/QPI), it seems for me like we'll have same
> QPI-wall, as we've already have memory wall. I see no chance for QPI
> being fast, wide, cheap, and scale to hundreds of CPUs same time. So
> we'll still need some kind of weak consistent layer with explicit
> flushing control to hide weakness of memory (and QPI, as part of
> memory engine).
>
> What I trying to say here: seems like we always will have strict
> consistent but rather slow memory (with QPI), and quick but weak
> consistent memory. Border between them could move -- now days servers
> and desktops have tiny weak-consistent layer, while grids and clusters
> have all its memory "weak consistent" (only explicitly synchronized).
>
> And if my assumption are not too far from reality, it seems promising
> (or at least interesting) to trying to investigate algorithms which
> can exploit inconsistency, instead of trying to fight with it with
> fences. I see much analogies with distributed systems here, there
> "eventually consistent" design becoming de-facto standard today (the
> CAP theorem Ulrich mention). Talking about race-based algorithms I
> have this kind of design in my mind.
>
> Do you know about any works in this direction? For now I see only one
> promising example for exploiting eventually consistent approach -- the
> sync-less cache for atomically published entities, like primitives
> (except long/double sure) or immutable objects.
>
> 2012/8/17 Ulrich Grepel <uli at grepel.de>:
> > On 17.08.2012 02:01, concurrency-interest-request at cs.oswego.edu wrote:
> >>
> >> Date: Thu, 16 Aug 2012 20:00:57 -0400
> >> From: Vitaly Davidovich <vitalyd at gmail.com>
> >>
> >>
> >> NUMA and faster interconnects between the nodes (e.g Intel's QPI) seem
> to
> >> be hw manufacturers' strategy at the moment.  I have a hard time
> imagining
> >> that someone like Intel will ditch cache coherence to support hugely
> >> parallel machines - seems like they'll need to continue finding
> >> new/improved ways to scale within it.  Will definitely be interesting to
> >> see ...
> >>
> > Large scale parallel systems can be found in the supercomputing field.
> The
> > single most important hardware issue there is interconnect speed and the
> > single most important software issue is to try to minimize communication
> > between all those threads.
> >
> > If you've got hundreds of thousands or even millions of cores, all with
> > local RAM, there's just no way to quickly synchronize caches or even
> RAM, so
> > you won't have a synchronized memory model there, at least not across all
> > cores. So with massive parallelism, the problem is will remain: syncing
> is
> > expensive, so avoid it.
> >
> > Transferring this into the Java world of the future we either hit a
> hardware
> > wall - more cores aren't providing any additional performance - or a
> > software wall - we need some extended NUMA concepts for Java threads. And
> > this in a language which is somewhat still supposed to fulfill the "write
> > once, run everywhere" paradigm.
> >
> > What I could imagine for example is something akin to Thread Local
> Storage,
> > but on a Cache Coherency Group ("CCG") level, something like "get me the
> > CCG's instance of the cache". And some guarantee that, if desired, a
> thread
> > remains in the CCG of the thread that started the second thread.
> >
> > Syncing the various CCGs is a challenge however. If you're updating one
> of
> > the CCG caches and want to propagate this to the other CCGs, you might
> run
> > into the same kind of problems that distributed databases run into. See
> the
> > CAP theorem - you can have two out of the following three: Consistency,
> > Availability and Partition Tolerance.
> >
> > Uli
> >
> > _______________________________________________
> > Concurrency-interest mailing list
> > Concurrency-interest at cs.oswego.edu
> > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120817/9113c6ed/attachment.html>


More information about the Concurrency-interest mailing list