[concurrency-interest] ThreadLocal vs ProcessorLocal

Doug Lea dl at cs.oswego.edu
Fri Oct 12 11:57:03 EDT 2012

On 10/12/12 08:53, Antoine Chambille wrote:

> I am certainly to much focused on our own requirements (in-memory database, TB
> heaps, tens of cores) but I feel that Java (once the best language in the world
> for concurrent programming, when JDK5 was released) is not ready for the
> many-cores era.

(JDK5 was back in the days where I could help implement new JVM support
(for atomics etc) and let it slip in under the radar because no one
else much cared about concurrency features.)

> With respect to NUMA: in the medium term we will have to try exotic deployments.

Even if you had NUMA information (for example, a distance metric among cores)
if all of your computations are fork/join-like, you'd also have to account
for the fact that some tasks (for example, sorting the upper half of an array)
are better off located further away (to reduce cache pollution) and some
(for example, reducing a sum) closer (to reduce traffic and exploit
cache sharing). Automating this is not easy. One approach that has
a chance of working reasonably well inside FJ is to associate affinity with
computation tree depth. But we don't yet have JVM support to try things
like this out.

In the mean time; the worst NUMA FJ effects I see on machines I test
on seem to be around 2X. A factor of two would be hard to make up for
using multiple JVMs, but might (depending on the OS and underlying
scheduling) be improved a bit by creating multiple FJPools,
and submitting affine tasks to each.


More information about the Concurrency-interest mailing list