[concurrency-interest] ThreadLocal vs ProcessorLocal

David Dice david.dice at gmail.com
Mon Oct 15 16:03:07 EDT 2012


> Message: 2
> Date: Mon, 15 Oct 2012 10:28:33 +0200
> From: Antoine Chambille <ach at quartetfs.com>
> To: concurrency-interest at cs.oswego.edu
> Cc: Doug Lea <dl at cs.oswego.edu>
> Subject: Re: [concurrency-interest] ThreadLocal vs ProcessorLocal
> Message-ID:
>         <CAJGQDwn2Sr=DDJ5Bbp14JB4uq0W0vfAK=AFfCibZ=
> kKNTOm+PA at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Thanks for those explanations.
>
> I agree that manually distributing JVMs on NUMA nodes won't magically
> remove the NUMA effect (by the way I confirm that the performance drop we
> measure with NUMA and a global fork/join pool is around 2X).
>
> But for data stuctures that can be partitioned (an in-memory database is a
> good candidate) and for which the computation workload can be expressed as
> divide and conquer over those partitions, then I believe the performance
> can be won back. An issue remains: the application deployment and
> monitoring becomes much more complex.
>
>
> Maybe Gregg is right, we should stop being shy and wrap a bit of JNI to
> retrieve processor id or even better: set native thread affinity. I have
> seen people doing it in relatively portable ways (
> https://github.com/peter-lawrey/Java-Thread-Affinity ). That way within
> one
> JVM we should be able to "physically" allocate one fork join pool per NUMA
> node, partition the data, and make sure the pool that writes data to a
> partition is also the pool that handles partial queries for it.
>
>
> Oh and while on the subject of leveraging many cores, the newly released
> StampedLock looks like a major contribution. Especially the
> "Unsafe.getXVolatile" unofficial load fence disclosed by Doug that is an
> important building block for software transactional memory and multiversion
> data structures. That was a well-guarded secret ;)
>
>
> -Antoine
> Quartet FS
>
>
An alternative to binding or otherwise forcing thread placement is to make
NUMA-aware data structures that tolerate ambient thread placement.
https://blogs.oracle.com/dave/resource/NUMA-aware-JUCExchanger.pdf gives a
brief sketch on how this was applied to JUC Exchanger, but it's applicable
to other constructs as well.

Regards
Dave

p.s., it's also easy to create NUMA-friendly locks :
https://blogs.oracle.com/dave/entry/lock_cohorting_to_appear_in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20121015/4d09f52a/attachment.html>


More information about the Concurrency-interest mailing list