[concurrency-interest] when is safe publication safe?
dl at cs.oswego.edu
Tue Apr 27 18:46:54 EDT 2010
On 04/27/10 13:45, Boehm, Hans wrote:
> Has someone actually measured the cost of volatile across platforms?
I don't know of a systematic survey, but here's a rough guide
across six multi-coreable/multi-processor-able processors that
together probably account for nearly all JVMs on MPs out there.
Each kind of processor differs enough across models that this
is only a very rough guide, not suitable for quoting or
programming against, but maybe useful for perspective.
For volatile reads, three categories:
* Usually Cheap: x86/x64, sparc
-- no hw fences or special instructions but lost optimizations.
* May be Noticeable: IA64, Azul
-- may need acquire fences/insns that require < 10 cycles.
* May be Expensive: POWER, ARM
-- may need general fences that require > 10 cycles
For volatile writes:
* All of the above need at least a trailing store-load or
use of atomics that normally costs in the dozens of cycles, but
less so on some recent x86 and Azul.
* On Azul, Power, and ARM, volatile writes may additionally need a
preceding release fence, that is in the < 10 cycles range.
All those "may's" stem from availability, in principle, of
compiler techniques that can lead to cheaper mechanisms in
In general, fences and atomics have been getting cheaper over
the past few years, especially on x86. But just about any
processor designed around 2002-2007 is likely to have costs closer
to (or over) 100 cycles than 10 cycles. In other words, on
most platforms, fence (and atomic CAS) costs used to be more
worth avoiding than they have been recently. As Hans points out,
even the more expensive kinds of fences are often cheaper than
cache misses these days. On the other hand, the base cost of loss
of many basic compiler optimizations (storing values in registers
etc) associated with volatiles is sometimes more costly in the
long run than any fence costs.
More information about the Concurrency-interest