[concurrency-interest] How bad can volatile long++ be?

Boehm, Hans hans.boehm at hp.com
Mon Dec 10 16:24:47 EST 2007



> -----Original Message-----
> From: Thomas.Hawtin at Sun.COM [mailto:Thomas.Hawtin at Sun.COM]
> Sent: Monday, December 10, 2007 1:07 PM
> To: Boehm, Hans
> Cc: Chris Purcell; Sam Berlin; Concurrency-interest at cs.oswego.edu
> Subject: Re: [concurrency-interest] How bad can volatile long++ be?
>
> Boehm, Hans wrote:
> > In Java, ++ on volatiles is not atomic.  I strongly
> recommend against maintaining even statistical counters that
> way.  Use either java.util.concurrent.atomic.AtomicLong or
> per-thread counters.
> >
> > The following non-Java anecdote from several years ago
> convinced me of this:
>
> So, before java.util.concurrent?
>
> > Our conservative garbage collector used to optionally
> maintain approximate counts of live objects this way, by
> having each GC thread increment a shared counter as it found
> a live object.  There are as many GC threads as processors.
> After observing some misbehavior on a dual processor (2
> sockets, 1 core/socket) machine, and tracking down what was
> going on, I discovered that:
> >
> > 1) Maintaining the shared counter slowed down the collector
> by about a factor of two, negating the benefit of running on
> two processors.
> >
> > 2) The final count was consistently off by about a factor of two.
> >
> > The cause for (1) was no doubt contention on the cache line
> containing the counter.  As a result of that, both threads
> spent a lot of their time waiting on that cache line, and
> presumably the timing worked out such that there was a high
> probability the line got stolen between the load and the
> store of the ++, resulting in the loss of about half the updates.
>
> I don't think AtomicLong is going to help there, as it is
> still sharing a volatile. So thread-local (ThreadLocal)
> counters are presumably the way to go, possibly using an
> unshared AtomicLong.
AtomicLong will definitely help with (2).  It might help with (1), especially on something like X86, since the cache line can only be stolen between ++ operations, not in the middle.  Thus the number of coherence misses might be reduced.  You are adding some needless memory ordering overhead in the process, but I think that's often cheaper on recent processors.  But I haven't done the measurements.

I tend to worry more about (2) than (1), especially if I don't expect the counting operation to be that frequently executed.  Having a statistical counter that lies systematically in what are probably precisely the most interesting cases is really not very useful.  So long as I get the right answer, I at least know that the counter got unexpectedly hot and is probably causing a problem.

Other than that, I agree with your conclusions.

I suspect that before java.util.concurrent, you really wanted thread-local counters.  If performance is really not much of an issue, synchronizing the updates seems like a distant second choice.

Hans

>
> Tom Hawtin
>



More information about the Concurrency-interest mailing list