[concurrency-interest] when is safe publication safe?

Boehm, Hans hans.boehm at hp.com
Tue Apr 27 13:45:04 EDT 2010


Has someone actually measured the cost of volatile across platforms?
Since most of us are presumably on X86 platforms, I wonder whether
there are also implementation problems underlying some of these
perceptions.  On X86, a volatile load should differ from a normal
load only in that the compiler can no longer do some reordering,
and a volatile load will normally prevent reuse of previously loaded
values.  I'd expect this to be in the barely measurable category
for most applications.

A volatile store on X86 does require the use of either xchg or mfence,
both of which are slow (dozens of cycles?  more on P4).  But if you're
frequently writing to a shared volatile, you will end up taking lots of
coherence cache misses no matter what.  Thus I suspect it's fairly rare
for the fence overhead to really dominate everything else.  Doug has pointed
out some examples for which it's a major issue, so I don't want to
downplay this problem.  But it's clearly not an issue for read-mostly
applications like the one under discussion here.

Again the above only applies to X86, and probably SPARC TSO, which I believe
is very similar.  Other architectures do have more overhead on the read side.
(For Itanium, it should still only be on the order of half a dozen cycles.)

Hans

> -----Original Message-----
> From: David Holmes [mailto:davidcholmes at aapt.net.au] 
> Sent: Monday, April 26, 2010 11:17 PM
> To: Boehm, Hans
> Cc: concurrency-interest
> Subject: RE: [concurrency-interest] when is safe publication safe?
> 
> Hans,
> 
> Not speaking for the OP but I assume he is looking for the 
> most efficient way to express this in Java regardless of 
> platform. Right now all there is is volatile. volatile comes 
> at a cost that depends on the platform, but there still seems 
> to be a common impression that volatile is expensive - 
> period. The question is whether there is something weaker 
> than volatile (aka Fences and other related discussions) that 
> will still give the desired semantics while improving 
> performance in at least some cases.
> 
> But it sounds to me that the semantics will actually require 
> volatile's guarantees.
> 
> Cheers,
> David
> 
> > -----Original Message-----
> > From: concurrency-interest-bounces at cs.oswego.edu
> > [mailto:concurrency-interest-bounces at cs.oswego.edu]On 
> Behalf Of Hans 
> > Boehm
> > Sent: Tuesday, 27 April 2010 3:05 PM
> > To: dholmes at ieee.org
> > Cc: Joe Bowbeer; concurrency-interest
> > Subject: Re: [concurrency-interest] when is safe publication safe?
> >
> >
> > I think it's hard to talk about the issues here without 
> being somewhat 
> > architecture specific.
> >
> > On X86 or SPARC TSO or similar, there no memory fence/barrier 
> > instruction is needed on the read side, even if volatiles 
> are used.  
> > Ever.  So I presume we're concerned about other 
> architectures that do need them.
> >
> > On those architectures, we might be able to reduce the cost of the 
> > fence if we had something like acquire/release ordering.  We could 
> > theoretically eliminate it if the accesses in question are 
> dependent, 
> > and the architecture enforces ordering based on dependencies.
> > I think that's usually how final field implementations work.
> > We currently don't really have either of those mechanisms.
> >
> > I'm not sure why versioned references matter here.  It 
> sounded to me 
> > like just switching an ordinary volatile reference to a newly 
> > allocated "meta class" would do the trick.  The issue is 
> with the cost.
> >
> > Hans
> >
> > On Tue, 27 Apr 2010, David Holmes wrote:
> >
> > > The problem with using any existing j.u.c class, or even 
> thread-safe 
> > > j.u class, is that the key issue here seems to be the desire to 
> > > avoid the memory-barriers on the read side when they are 
> not needed. 
> > > This is to me a somewhat utopian goal - if we could tell when 
> > > barriers were and were not needed we would only inject them when 
> > > necessary (and of course the cost of determining this mustn't 
> > > outweigh the cost of unconditionally using the 
> memory-barrier in the 
> > > first place). But all our existing Java library classes are 
> > > correctly synchronized and have at a
> > minimum volatile
> > > semantics on reads, which implies the need for memory-barriers in 
> > > the general case.
> > >  
> > > If a thread must see an update to the meta-class object then any 
> > > read of a version would have to be exact - which implies 
> it has to 
> > > be correctly synchronized with respect to the updating 
> thread, which 
> > > implies
> > a minimum
> > > of volatile semantics on the read.
> > >  
> > > This seems to be a similar problem to the scalable-counter
> > discussion. If
> > > the count is exact you need correct synchronization; 
> without correct 
> > > synchronization the count can't be exact. You have to 
> pick which is 
> > > more important.
> > >  
> > > David Holmes
> > > -----Original Message-----
> > > From: concurrency-interest-bounces at cs.oswego.edu
> > > [mailto:concurrency-interest-bounces at cs.oswego.edu]On 
> Behalf Of Joe 
> > > Bowbeer
> > > Sent: Tuesday, 27 April 2010 11:20 AM
> > > To: concurrency-interest
> > > Subject: Re: [concurrency-interest] when is safe publication safe?
> > >
> > >       I'm thinking of a versioned reference.  Adding a new method
> > >       to an object would rev the version.  A thread could
> > >       optimistically use a cached instance as long as the revision
> > >       matched.  AtomicStampedReference can be used to associate a
> > >       version with a reference?
> > >
> > > I'm just throwing out ideas in an effort to understand 
> the problem 
> > > better...  Hope you don't mind.
> > >
> > >
> > > On Mon, Apr 26, 2010 at 2:02 PM, Boehm, Hans wrote:
> > >       To clarify, hopefully:
> > >
> > >       The core problem here is presumably that
> > >       getMetaClassOf() reads a field, call it mc, containing
> > >       a reference to the meta class information, and
> > >       initialization of that field may race with the access.
> > >        The classic solution would be to declare mc
> > >       "volatile", which avoids the data race and (in the
> > >       absence of other data races) restores sequential
> > >       consistency, so you don't have to worry about
> > >       visibility issues.  (If you do want to worry about
> > >       happens before ordering, it establishes the right
> > >       happens before ordering.)
> > >
> > >       The reason you may not be happy with this is that it's
> > >       slow on some architectures.  However the performance
> > >       hit on X86 should be negligible, since an ordinary load
> > >       is sufficient to implement the volatile load.  The
> > >       store will be more expensive, but that's rare.  Thus
> > >       you are presumably concerned about non-X86
> > >       architectures?
> > >
> > >       It's a bit confusing to talk about memory barriers
> > >       since, as far as I know, the only Java class with
> > >       "barrier" in the name is CyclicBarrier, which is
> > >       something very different.
> > >
> > >       Hans
> > >
> > > > -----Original Message-----
> > > > From: Jochen Theodorou
> > > > Sent: Sunday, April 25, 2010 8:49 AM
> > > > To: concurrency-interest at cs.oswego.edu
> > > > Subject: Re: [concurrency-interest] when is safe
> > > publication safe?
> > > >
> > > > Doug Lea wrote:
> > > > > On 04/25/10 05:31, Jochen Theodorou wrote:
> > > > >>> As a first step, consider exactly what
> > > effects/semantics you want
> > > > >>> here, and the ways you intend people to be able to
> > > write
> > > > >>> conditionally correct Groovy code.
> > > > >>
> > > > >> People wouldn't have to write conditionally correct
> > > Groovy
> > > > code. they
> > > > >> would write normal code as they would in Java (Groovy
> > > and Java are
> > > > >> very near).
> > > > >
> > > > > It seems implausible that you could do enough analysis at
> > > load/run
> > > > > time to determine whether you need full locking in the
> > > presence of
> > > > > multithreaded racy initialization vs much cheaper release
> > > > fences. This
> > > > > would require at least some high-quality escape analysis.
> > > > And the code
> > > > > generated would differ both for the writing and reading
> > > callers.
> > > >
> > > > maybe I did explain it not good. Let us assume I have the
> > > Groovy code:
> > > >
> > > > 1+1
> > > >
> > > > Then this is really something along the lines of:
> > > >
> > > > SBA.getMetaClassOf(1).invoke("plus",1)
> > > >
> > > > and SBA.getMetaClassOf(1) would return the meta class 
> of Integer. 
> > > > Since this is purely a runtime construct, it does not 
> exist until 
> > > > the first time this meta class is
> > > requested.
> > > > So getMetaClassOf would be the place to initialize the 
> meta class, 
> > > > that would register it in a global structure and on subsequent 
> > > > invocation use that cached meta class. If two threads 
> execute the 
> > > > code above, then one would do the initialization, while 
> the other 
> > > > has to wait. The waiting thread would then read the initialized 
> > > > global meta class.
> > > On
> > > > subsequent invocations both threads would just read. 
> Since changes 
> > > > of the meta class are rare, we would in 99% of all cases simply 
> > > > read the existing value. Since we have to be memory 
> aware, these 
> > > > meta class can be unloaded at runtime too. They are 
> SoftReferenced 
> > > > so it is done only if really needed. But rather than the normal 
> > > > change a
> > > reinitialization
> > > > might be needed much more often.
> > > >
> > > > As you see the user code "1+1" does contain zero 
> synchronization 
> > > > code.
> > > > The memory barriers are all in the runtime. It is not that this 
> > > > cannot be solved by using what Java already has, it is 
> that this 
> > > > is too expensive.
> > > >
> > > > > As I mentioned, an alternative is to lay down some rules.
> > > > > If people stick to the rules they get consistent (in the
> > > sense of
> > > > > data-race-free) executions, else they might not. And of
> > > > such rules, I
> > > > > think the ones that can apply here amount to saying that
> > > > other threads
> > > > > performing initializations cannot trust any of their
> > > reads of the
> > > > > partially initialized object.
> > > > > And further, they cannot leak refs to that object outside
> > > > of the group
> > > > > of initializer threads.
> > > > >
> > > > > This is not hugely different than the Swing threading
> > > rules
> > > > >
> > > >
> > > 
> (http://java.sun.com/products/jfc/tsc/articles/threads/threads1.html
> > > )
> > > > > but applies only during initialization.
> > > >
> > > > but unlike what the above may suggest there is no single 
> > > > initialization phase. The meta classes are created on
> > > demand.
> > > > We cannot know beforehand which meta classes are needed 
> and doing 
> > > > them all before starting would increase the startup time big 
> > > > times.
> > > >
> > > > If there were of course a way to recognize a partially 
> initialized 
> > > > object I could maybe think of something... but
> > > is
> > > > there a reliable one?
> > > >
> > > > bye blackdrag
> > > >
> > > > --
> > > > Jochen "blackdrag" Theodorou
> > > > The Groovy Project Tech Lead (http://groovy.codehaus.org) 
> > > > http://blackdragsview.blogspot.com/
> > >
> > >
> > >
> 
> 


More information about the Concurrency-interest mailing list