[concurrency-interest] NUMA-Aware Java Heaps for in-memory databases

Michał Warecki michal.warecki at gmail.com
Sat Feb 16 12:51:44 EST 2013


Yes, of course. I didn't mean just a simple CPU binding. After evacuation
(from young to old) and compaction, threads have to be rebinded.
As you wrote, problem is with old gen heap and therefore JVM modifications
are needed. I don't know if you can deliver custom OpenJDK HotSpot.
Probably you have to experiment with NUMA-aware old gen and NUMA-aware
objects reordering during compaction (you have to take care also about CPU
cache and TLB). Which GC are you using? With CMS there is another issues
with fragmentation after a few collections. I believe, "bump the pointer"
allocation is better in this case.
If you have time on such issues in the work you're lucky :-)

These are just my thoughts, but I'm no expert.

Michał

2013/2/15 Antoine Chambille <ach at quartetfs.com>

> > Michal
>
> That was my initial hope. Binding threads to a NUMA node, and crossing my
> fingers that the objects instantiated by a thread would be allocated on its
> home NUMA node. And stay there. So that if later the same thread reads the
> same data it will read memory from its home NUMA node.
>
> But that is not how it works. Even with NUMA options activated, the
> hotspot JVM will move objects when they get promoted into the old
> generation, removing them from their home NUMA node and copying them in
> some big shared NUMA-oblivious memory area...
>
>
> I agree that binding threads to NUMA nodes is actually an easy trick when
> you use the JNA library. There is even an open source project by Peter
> Lawrey that does it quite elegantly and cross platform.
> https://github.com/peter-lawrey/Java-Thread-Affinity/
>
> -Antoine
>
>
>
>
>
> On 15 February 2013 16:24, Michał Warecki <michal.warecki at gmail.com>wrote:
>
>> Hi!
>>
>> I may be wrong but:
>> If you know in which NUMA node the data are stored and which thread will
>> read this data, you can use pthead_setaffinity_np() function.
>> This will direct particular thread to particular CPU with faster access
>> to particular NUMA node.
>> That's not a Pandora box but you have to use JNI.
>>
>> Cheers,
>> Michał
>>
>>
>> 2013/2/15 Antoine Chambille <ach at quartetfs.com>
>>
>>> Data is stored in columns, to maximize the performance of analytical
>>> queries that commonly scan billions of rows but only for a subset of the
>>> columns. We support a mix of primitive data and object oriented data ( some
>>> columns look like double[], some other look like Object[] ).
>>>
>>> Using direct buffers would open a door to NUMA-Aware memory placement
>>> (provided that the direct allocation itself can be made on the right node).
>>> That's probably more a Pandora box than a door though ;) Anyway it implies
>>> serializing data into byte arrays, and deserializing at each query. That's
>>> a serious performance penalty for primitive data, and that's absolutely
>>> prohibitive when you do that with plain objects, even Externizable ones.
>>>
>>> -Antoine
>>>
>>>
>>> On 15 February 2013 11:35, Stanimir Simeonoff <stanimir at riflexo.com>wrote:
>>>
>>>> Just out of curiosity: would not DirectBuffers and managing the data
>>>> yourself would be both easier and more efficient?
>>>> Technically you can ship the data w/o even copying it straight to the
>>>> sockets (or disks).
>>>> I don't know how you store the data itself but I can think only of
>>>> tuples i.e. Object[].
>>>>
>>>> Stanimir
>>>>
>>>> On Fri, Feb 15, 2013 at 11:48 AM, Antoine Chambille <ach at quartetfs.com>wrote:
>>>>
>>>>> I think this community is the right place to start a conversation
>>>>> about NUMA (aren't NUMA nodes to memory what multiprocessors are to
>>>>> processing? ;). I apologize if this is considered off-topic.
>>>>>
>>>>>
>>>>> We are developing a Java in-memory analytical database (it's called
>>>>> "ActivePivot") that our customers deploy on ever larger datasets. Some
>>>>> ActivePivot instances are deployed on java heaps close to 1TB, on NUMA
>>>>> servers (typically 4 Xeon processors and 4 NUMA nodes). This is becoming a
>>>>> trend, and we are researching solutions to improve our performance on NUMA
>>>>> configurations.
>>>>>
>>>>>
>>>>> We understand that in the current state of things (and including JDK8)
>>>>> the support for NUMA in hotspot is the following:
>>>>> * The young generation heap layout can be NUMA-Aware (partitioned per
>>>>> NUMA node, objects allocated in the same node than the running thread)
>>>>> * The old generation heap layout is not optimized for NUMA (at best
>>>>> the old generation is interleaved among nodes which at least makes memory
>>>>> accesses somewhat uniform)
>>>>> * The parallel garbage collector is NUMA optimized, the GC threads
>>>>> focusing on objects in their node.
>>>>>
>>>>>
>>>>> Yet activating -XX:+UseNUMA option has almost no impact on the
>>>>> performance of our in-memory database. It is not surprising, the pattern
>>>>> for a database is to load the data in the memory and then make queries on
>>>>> it. The data goes and stays in the old generation, and it is read from
>>>>> there by queries. Most memory accesses are in the old gen and most of those
>>>>> are not local.
>>>>>
>>>>> I guess there is a reason hotspot does not yet optimize the old
>>>>> generation for NUMA. It must be very difficult to do it in the general
>>>>> case, when you have no idea what thread from what node will read data and
>>>>> interleaving is. But for an in-memory database this is frustrating because
>>>>> we know very well which threads will access which piece of data. At least
>>>>> in ActivePivot data structures are partitioned, partitions are each
>>>>> assigned a thread pool so the threads that allocated the data in a
>>>>> partition are also the threads that perform sub-queries on that partition.
>>>>> We are a few lines of code away from binding thread pools to NUMA nodes,
>>>>> and if the garbage collector would leave objects promoted to the old
>>>>> generation on their original NUMA node memory accesses would be close to
>>>>> optimal.
>>>>>
>>>>> We have not been able to do that. But that being said I read an
>>>>> inspiring 2005 article from Mustafa M. Tikir and Jeffrey K. Hollingsworth
>>>>> that did experiment on NUMA layouts for the old generation. ("NUMA-aware
>>>>> Java heaps for server applications"
>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.6587&rep=rep1&type=pdf). That motivated me to ask the following questions:
>>>>>
>>>>>
>>>>> * Are there hidden or experimental hotspot options that allow
>>>>> NUMA-Aware partitioning of the old generation?
>>>>> * Do you know why there isn't much (visible, generally available)
>>>>> research on NUMA optimizations for the old gen? Is the Java in-memory
>>>>> database use case considered a rare one?
>>>>> * Maybe we should experiment and even contribute new heap layouts to
>>>>> the open-jdk project. Can some of you guys comment on the difficulty of
>>>>> that?
>>>>>
>>>>>
>>>>> Thanks for reading,
>>>>>
>>>>> --
>>>>> Antoine CHAMBILLE
>>>>> Director Research & Development
>>>>> Quartet FS
>>>>>
>>>>> _______________________________________________
>>>>> Concurrency-interest mailing list
>>>>> Concurrency-interest at cs.oswego.edu
>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Antoine CHAMBILLE
>>> Director Research & Development
>>> Quartet FS
>>>
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>
>
>
> --
> Antoine CHAMBILLE
> Director Research & Development
> Quartet FS
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130216/fbfdad05/attachment.html>


More information about the Concurrency-interest mailing list