[concurrency-interest] NUMA-Aware Java Heaps for in-memory databases

Michał Warecki michal.warecki at gmail.com
Fri Feb 15 10:24:51 EST 2013


Hi!

I may be wrong but:
If you know in which NUMA node the data are stored and which thread will
read this data, you can use pthead_setaffinity_np() function.
This will direct particular thread to particular CPU with faster access to
particular NUMA node.
That's not a Pandora box but you have to use JNI.

Cheers,
Michał

2013/2/15 Antoine Chambille <ach at quartetfs.com>

> Data is stored in columns, to maximize the performance of analytical
> queries that commonly scan billions of rows but only for a subset of the
> columns. We support a mix of primitive data and object oriented data ( some
> columns look like double[], some other look like Object[] ).
>
> Using direct buffers would open a door to NUMA-Aware memory placement
> (provided that the direct allocation itself can be made on the right node).
> That's probably more a Pandora box than a door though ;) Anyway it implies
> serializing data into byte arrays, and deserializing at each query. That's
> a serious performance penalty for primitive data, and that's absolutely
> prohibitive when you do that with plain objects, even Externizable ones.
>
> -Antoine
>
>
> On 15 February 2013 11:35, Stanimir Simeonoff <stanimir at riflexo.com>wrote:
>
>> Just out of curiosity: would not DirectBuffers and managing the data
>> yourself would be both easier and more efficient?
>> Technically you can ship the data w/o even copying it straight to the
>> sockets (or disks).
>> I don't know how you store the data itself but I can think only of tuples
>> i.e. Object[].
>>
>> Stanimir
>>
>> On Fri, Feb 15, 2013 at 11:48 AM, Antoine Chambille <ach at quartetfs.com>wrote:
>>
>>> I think this community is the right place to start a conversation about
>>> NUMA (aren't NUMA nodes to memory what multiprocessors are to processing?
>>> ;). I apologize if this is considered off-topic.
>>>
>>>
>>> We are developing a Java in-memory analytical database (it's called
>>> "ActivePivot") that our customers deploy on ever larger datasets. Some
>>> ActivePivot instances are deployed on java heaps close to 1TB, on NUMA
>>> servers (typically 4 Xeon processors and 4 NUMA nodes). This is becoming a
>>> trend, and we are researching solutions to improve our performance on NUMA
>>> configurations.
>>>
>>>
>>> We understand that in the current state of things (and including JDK8)
>>> the support for NUMA in hotspot is the following:
>>> * The young generation heap layout can be NUMA-Aware (partitioned per
>>> NUMA node, objects allocated in the same node than the running thread)
>>> * The old generation heap layout is not optimized for NUMA (at best the
>>> old generation is interleaved among nodes which at least makes memory
>>> accesses somewhat uniform)
>>> * The parallel garbage collector is NUMA optimized, the GC threads
>>> focusing on objects in their node.
>>>
>>>
>>> Yet activating -XX:+UseNUMA option has almost no impact on the
>>> performance of our in-memory database. It is not surprising, the pattern
>>> for a database is to load the data in the memory and then make queries on
>>> it. The data goes and stays in the old generation, and it is read from
>>> there by queries. Most memory accesses are in the old gen and most of those
>>> are not local.
>>>
>>> I guess there is a reason hotspot does not yet optimize the old
>>> generation for NUMA. It must be very difficult to do it in the general
>>> case, when you have no idea what thread from what node will read data and
>>> interleaving is. But for an in-memory database this is frustrating because
>>> we know very well which threads will access which piece of data. At least
>>> in ActivePivot data structures are partitioned, partitions are each
>>> assigned a thread pool so the threads that allocated the data in a
>>> partition are also the threads that perform sub-queries on that partition.
>>> We are a few lines of code away from binding thread pools to NUMA nodes,
>>> and if the garbage collector would leave objects promoted to the old
>>> generation on their original NUMA node memory accesses would be close to
>>> optimal.
>>>
>>> We have not been able to do that. But that being said I read an
>>> inspiring 2005 article from Mustafa M. Tikir and Jeffrey K. Hollingsworth
>>> that did experiment on NUMA layouts for the old generation. ("NUMA-aware
>>> Java heaps for server applications"
>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.6587&rep=rep1&type=pdf). That motivated me to ask the following questions:
>>>
>>>
>>> * Are there hidden or experimental hotspot options that allow NUMA-Aware
>>> partitioning of the old generation?
>>> * Do you know why there isn't much (visible, generally available)
>>> research on NUMA optimizations for the old gen? Is the Java in-memory
>>> database use case considered a rare one?
>>> * Maybe we should experiment and even contribute new heap layouts to the
>>> open-jdk project. Can some of you guys comment on the difficulty of that?
>>>
>>>
>>> Thanks for reading,
>>>
>>> --
>>> Antoine CHAMBILLE
>>> Director Research & Development
>>> Quartet FS
>>>
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>
>
>
> --
> Antoine CHAMBILLE
> Director Research & Development
> Quartet FS
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130215/b1b98f3a/attachment.html>


More information about the Concurrency-interest mailing list