[concurrency-interest] ThreadLocal vs ProcessorLocal

Kirk Pepperdine kirk at kodewerk.com
Wed Oct 17 03:15:16 EDT 2012


Hi David,

I wish that NUMA would handle this so that one wouldn't need to explicitly code in thread affinity but I fear that a working with a strong hint from a developer feels like a reasonable compromise.knowledge. In deed, Peter Lawrey has a project on github that makes some attempt at setting native thread affinity. That said, you comment about the dangers of making local decisions in the absence of global knowledge that are re-enforced in your CPUID blog posting are spot on. Since the CPU has that global view.....

SOT, right now we are using CPUID for a number of reasons. Very easy to get to on Linux as we can do this in pure Java. However other platforms require C++/assembler so having <gulp> an unsafe operation in the JVM would be a win from my POV.

Regards,
Kirk

On 2012-10-15, at 9:03 PM, David Dice <david.dice at gmail.com> wrote:

> 
> Message: 2
> Date: Mon, 15 Oct 2012 10:28:33 +0200
> From: Antoine Chambille <ach at quartetfs.com>
> To: concurrency-interest at cs.oswego.edu
> Cc: Doug Lea <dl at cs.oswego.edu>
> Subject: Re: [concurrency-interest] ThreadLocal vs ProcessorLocal
> Message-ID:
>         <CAJGQDwn2Sr=DDJ5Bbp14JB4uq0W0vfAK=AFfCibZ=kKNTOm+PA at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Thanks for those explanations.
> 
> I agree that manually distributing JVMs on NUMA nodes won't magically
> remove the NUMA effect (by the way I confirm that the performance drop we
> measure with NUMA and a global fork/join pool is around 2X).
> 
> But for data stuctures that can be partitioned (an in-memory database is a
> good candidate) and for which the computation workload can be expressed as
> divide and conquer over those partitions, then I believe the performance
> can be won back. An issue remains: the application deployment and
> monitoring becomes much more complex.
> 
> 
> Maybe Gregg is right, we should stop being shy and wrap a bit of JNI to
> retrieve processor id or even better: set native thread affinity. I have
> seen people doing it in relatively portable ways (
> https://github.com/peter-lawrey/Java-Thread-Affinity ). That way within one
> JVM we should be able to "physically" allocate one fork join pool per NUMA
> node, partition the data, and make sure the pool that writes data to a
> partition is also the pool that handles partial queries for it.
> 
> 
> Oh and while on the subject of leveraging many cores, the newly released
> StampedLock looks like a major contribution. Especially the
> "Unsafe.getXVolatile" unofficial load fence disclosed by Doug that is an
> important building block for software transactional memory and multiversion
> data structures. That was a well-guarded secret ;)
> 
> 
> -Antoine
> Quartet FS
> 
> 
> An alternative to binding or otherwise forcing thread placement is to make NUMA-aware data structures that tolerate ambient thread placement.   https://blogs.oracle.com/dave/resource/NUMA-aware-JUCExchanger.pdf gives a brief sketch on how this was applied to JUC Exchanger, but it's applicable to other constructs as well.   
> 
> Regards
> Dave
> 
> p.s., it's also easy to create NUMA-friendly locks : https://blogs.oracle.com/dave/entry/lock_cohorting_to_appear_in
> 
> 
>  
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20121017/cdc3bf96/attachment.html>


More information about the Concurrency-interest mailing list