[concurrency-interest] How bad can volatile long++ be?
hans.boehm at hp.com
Mon Dec 10 16:24:47 EST 2007
> -----Original Message-----
> From: Thomas.Hawtin at Sun.COM [mailto:Thomas.Hawtin at Sun.COM]
> Sent: Monday, December 10, 2007 1:07 PM
> To: Boehm, Hans
> Cc: Chris Purcell; Sam Berlin; Concurrency-interest at cs.oswego.edu
> Subject: Re: [concurrency-interest] How bad can volatile long++ be?
> Boehm, Hans wrote:
> > In Java, ++ on volatiles is not atomic. I strongly
> recommend against maintaining even statistical counters that
> way. Use either java.util.concurrent.atomic.AtomicLong or
> per-thread counters.
> > The following non-Java anecdote from several years ago
> convinced me of this:
> So, before java.util.concurrent?
> > Our conservative garbage collector used to optionally
> maintain approximate counts of live objects this way, by
> having each GC thread increment a shared counter as it found
> a live object. There are as many GC threads as processors.
> After observing some misbehavior on a dual processor (2
> sockets, 1 core/socket) machine, and tracking down what was
> going on, I discovered that:
> > 1) Maintaining the shared counter slowed down the collector
> by about a factor of two, negating the benefit of running on
> two processors.
> > 2) The final count was consistently off by about a factor of two.
> > The cause for (1) was no doubt contention on the cache line
> containing the counter. As a result of that, both threads
> spent a lot of their time waiting on that cache line, and
> presumably the timing worked out such that there was a high
> probability the line got stolen between the load and the
> store of the ++, resulting in the loss of about half the updates.
> I don't think AtomicLong is going to help there, as it is
> still sharing a volatile. So thread-local (ThreadLocal)
> counters are presumably the way to go, possibly using an
> unshared AtomicLong.
AtomicLong will definitely help with (2). It might help with (1), especially on something like X86, since the cache line can only be stolen between ++ operations, not in the middle. Thus the number of coherence misses might be reduced. You are adding some needless memory ordering overhead in the process, but I think that's often cheaper on recent processors. But I haven't done the measurements.
I tend to worry more about (2) than (1), especially if I don't expect the counting operation to be that frequently executed. Having a statistical counter that lies systematically in what are probably precisely the most interesting cases is really not very useful. So long as I get the right answer, I at least know that the counter got unexpectedly hot and is probably causing a problem.
Other than that, I agree with your conclusions.
I suspect that before java.util.concurrent, you really wanted thread-local counters. If performance is really not much of an issue, synchronizing the updates seems like a distant second choice.
> Tom Hawtin
More information about the Concurrency-interest