[concurrency-interest] ThreadLocal vs ProcessorLocal

Antoine Chambille ach at quartetfs.com
Mon Oct 15 04:28:33 EDT 2012

Thanks for those explanations.

I agree that manually distributing JVMs on NUMA nodes won't magically
remove the NUMA effect (by the way I confirm that the performance drop we
measure with NUMA and a global fork/join pool is around 2X).

But for data stuctures that can be partitioned (an in-memory database is a
good candidate) and for which the computation workload can be expressed as
divide and conquer over those partitions, then I believe the performance
can be won back. An issue remains: the application deployment and
monitoring becomes much more complex.

Maybe Gregg is right, we should stop being shy and wrap a bit of JNI to
retrieve processor id or even better: set native thread affinity. I have
seen people doing it in relatively portable ways (
https://github.com/peter-lawrey/Java-Thread-Affinity ). That way within one
JVM we should be able to "physically" allocate one fork join pool per NUMA
node, partition the data, and make sure the pool that writes data to a
partition is also the pool that handles partial queries for it.

Oh and while on the subject of leveraging many cores, the newly released
StampedLock looks like a major contribution. Especially the
"Unsafe.getXVolatile" unofficial load fence disclosed by Doug that is an
important building block for software transactional memory and multiversion
data structures. That was a well-guarded secret ;)

Quartet FS

On 12 October 2012 17:57, Doug Lea <dl at cs.oswego.edu> wrote:

> On 10/12/12 08:53, Antoine Chambille wrote:
>> I am certainly to much focused on our own requirements (in-memory
>> database, TB
>> heaps, tens of cores) but I feel that Java (once the best language in the
>> world
>> for concurrent programming, when JDK5 was released) is not ready for the
>> many-cores era.
> (JDK5 was back in the days where I could help implement new JVM support
> (for atomics etc) and let it slip in under the radar because no one
> else much cared about concurrency features.)
>  With respect to NUMA: in the medium term we will have to try exotic
>> deployments.
> Even if you had NUMA information (for example, a distance metric among
> cores)
> if all of your computations are fork/join-like, you'd also have to account
> for the fact that some tasks (for example, sorting the upper half of an
> array)
> are better off located further away (to reduce cache pollution) and some
> (for example, reducing a sum) closer (to reduce traffic and exploit
> cache sharing). Automating this is not easy. One approach that has
> a chance of working reasonably well inside FJ is to associate affinity with
> computation tree depth. But we don't yet have JVM support to try things
> like this out.
> In the mean time; the worst NUMA FJ effects I see on machines I test
> on seem to be around 2X. A factor of two would be hard to make up for
> using multiple JVMs, but might (depending on the OS and underlying
> scheduling) be improved a bit by creating multiple FJPools,
> and submitting affine tasks to each.
> -Doug
> ______________________________**_________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.**oswego.edu <Concurrency-interest at cs.oswego.edu>
> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>

R&D Director
Quartet FS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20121015/4c29b055/attachment.html>

More information about the Concurrency-interest mailing list