[concurrency-interest] Array allocation and access on the JVM

Vitaly Davidovich vitalyd at gmail.com
Wed Jan 25 19:22:48 EST 2012


Slightly off topic but does the VM even need to mark cards when the
references are both in young gen? I thought they were needed for cases
where a tenured object references a young gen object; otherwise it seems
like scanning from the roots is sufficient? I assume the cost of figuring
out whether it's a cross generation write is too prohibitive to do for
every pointer store and it just always marks.

Sent from my phone
On Jan 25, 2012 7:13 PM, "Vitaly Davidovich" <vitalyd at gmail.com> wrote:

> I must say that's a really good hypothesis and would perfectly explain the
> observations by Aleksandar- nice one Doug! :)
>
> TLAB sizes are adjusted ergonomically (if not set via flag) as I
> understand it, so even though it doesn't really make sense to have them
> smaller than 32k, they're masking away this issue.
>
> Sent from my phone
> On Jan 25, 2012 6:57 PM, "Doug Lea" <dl at cs.oswego.edu> wrote:
>
>> On 01/25/12 14:48, Aleksandar Prokopec wrote:
>>
>>> Hello,
>>>
>>> I've recently posted the following question about memory contention on
>>> StackOverflow:
>>>
>>
>>
>> I asked Aleksandar offlist to check the effects of -XX:+UseCondCardMark
>> which as always these days was the main problem. Really, everyone
>> should always use this switch on hotspot on multiprocessors.
>> (or -XX:+UseG1GC, although it is often a little worse on throughput
>> but better-behaved for large heaps).
>>
>> As I mentioned in a post from a few weeks ago, see Dive Dice's blog
>> http://blogs.oracle.com/dave/**entry/false_sharing_induced_**by_card<http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card>
>>
>>
>> -Doug
>>
>>
>>
>>
>>
>>
>>> http://stackoverflow.com/**questions/8942396/array-**
>>> allocation-and-access-on-the-**java-virtual-machine-and-**
>>> memory-contention<http://stackoverflow.com/questions/8942396/array-allocation-and-access-on-the-java-virtual-machine-and-memory-contention>
>>>
>>> Basically, I have several threads defined like this:
>>>
>>> final class Worker extends Thread {
>>> Foo[] array = new Foo[1024];
>>> int sz;
>>>
>>> public Worker(int _sz) {
>>> sz = _sz;
>>> }
>>>
>>> public void run() {
>>> //Foo[] arr = new Foo[1024];
>>> Foo[] arr = array;
>>> loop(arr);
>>> }
>>>
>>> public void loop(Foo[] arr) {
>>> int i = 0;
>>> int pos = 512;
>>> Foo v = new Foo();
>>> while (i < sz) {
>>> if (i % 2 == 0) {
>>> arr[pos] = v;
>>> pos += 1;
>>> } else {
>>> pos -= 1;
>>> v = arr[pos];
>>> }
>>> i++;
>>> }
>>> }
>>> }
>>>
>>> The program takes 2 parameters - `size` and `par`. `size` is the size of
>>> the
>>> workload, and `par` is the number of threads to distribute the workload
>>> to.
>>> What each thread does is repetitively write and read an array location
>>> 512, in
>>> total `size`/ `par` times. Each thread reads its own array (locations
>>> being
>>> written to should be at least 1024*4 bytes away from each other).
>>>
>>> I've measured the speedup on an 2x 4-core Intel Xeon server and observed
>>> the
>>> following speedups for 1, 2, 4 and 8 processors (7 repetitions each):
>>>
>>>  >>> All running times: [2149, 2227, 1974, 1948, 1803, 2283, 1878]
>>>  >>> All running times: [1140, 1124, 2022, 1141, 2028, 2004, 2136]
>>>  >>> All running times: [867, 1022, 1457, 1342, 1436, 966, 1531]
>>>  >>> All running times: [915, 864, 1245, 1243, 948, 790, 1007]
>>>
>>> Apparently, this is due to memory contention.
>>> What I've observed is that if I comment the line `Foo[] arr = array` and
>>> uncomment the line above it to allocate the array within the thread, I
>>> get:
>>>
>>>  >>> All running times: [2053, 1966, 2089, 1937, 2046, 1909, 2011]
>>>  >>> All running times: [1048, 1178, 1100, 1194, 1367, 1271, 1207]
>>>  >>> All running times: [578, 508, 589, 571, 617, 643, 645]
>>>  >>> All running times: [330, 299, 300, 322, 331, 324, 575]
>>>
>>> My guess is that in the second case the arrays get allocated in thread
>>> local
>>> allocation buffers, so the array positions are much further from each
>>> other.
>>> Still, even in the first case processors should be writing to separate
>>> cache
>>> lines, so why does the first example scale so badly?
>>>
>>> Can anyone explain these numbers? Why does this memory contention happen?
>>>
>>> Thank you,
>>> Aleksandar Prokopec
>>>
>>>
>>>
>>> ------------------------------**------------------------------**
>>> ------------------
>>>
>>> // The complete runnable program:
>>>
>>>
>>> import java.util.ArrayList;
>>>
>>> class MultiStackJavaExperiment {
>>>
>>> final class Foo {
>>> int x = 0;
>>> }
>>>
>>> final class Worker extends Thread {
>>> Foo[] array = new Foo[1024];
>>> int sz;
>>>
>>> public Worker(int _sz) {
>>> sz = _sz;
>>> }
>>>
>>> public void run() {
>>> Foo[] arr = new Foo[1024];
>>> //Foo[] arr = array;
>>> loop(arr);
>>> }
>>>
>>> public void loop(Foo[] arr) {
>>> int i = 0;
>>> int pos = 512;
>>> Foo v = new Foo();
>>> while (i < sz) {
>>> if (i % 2 == 0) {
>>> arr[pos] = v;
>>> pos += 1;
>>> } else {
>>> pos -= 1;
>>> v = arr[pos];
>>> }
>>> i++;
>>> }
>>> }
>>> }
>>>
>>> public static void main(String[] args) {
>>> (new MultiStackJavaExperiment()).**mainMethod(args);
>>> }
>>>
>>> int size = Integer.parseInt(System.**getProperty("size"));
>>> int par = Integer.parseInt(System.**getProperty("par"));
>>>
>>> public void mainMethod(String[] args) {
>>> int times = 0;
>>> if (args.length == 0) times = 1;
>>> else times = Integer.parseInt(args[0]);
>>> ArrayList < Long > measurements = new ArrayList < Long > ();
>>>
>>> for (int i = 0; i < times; i++) {
>>> long start = System.currentTimeMillis();
>>> run();
>>> long end = System.currentTimeMillis();
>>>
>>> long time = (end - start);
>>> System.out.println(i + ") Running time: " + time + " ms");
>>> measurements.add(time);
>>> }
>>>
>>> System.out.println(">>>");
>>> System.out.println(">>> All running times: " + measurements);
>>> System.out.println(">>>");
>>> }
>>>
>>> public void run() {
>>> int sz = size / par;
>>> ArrayList < Thread > threads = new ArrayList < Thread > ();
>>>
>>> for (int i = 0; i < par; i++) {
>>> threads.add(new Worker(sz));
>>> threads.get(i).start();
>>> }
>>> for (int i = 0; i < par; i++) {
>>> try {
>>> threads.get(i).join();
>>> } catch (Exception e) {}
>>> }
>>> }
>>>
>>> }
>>>
>>>
>>>
>>> ______________________________**_________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.**oswego.edu<Concurrency-interest at cs.oswego.edu>
>>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>
>>
>> ______________________________**_________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.**oswego.edu <Concurrency-interest at cs.oswego.edu>
>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120125/8daa1c08/attachment.html>


More information about the Concurrency-interest mailing list