[concurrency-interest] Array allocation and access on the JVM

Dmitry Zaslavsky dmitry.zaslavsky at gmail.com
Wed Jan 25 18:46:58 EST 2012


This is a complete guess, but in C++ on NUMA system in Linux with default allocation strategy, you get into classical pattern of allocating memory local to the the node starting the threads and the other threads will pay perf penalty. 

While in c++ its a know "feature", I don't know if it is your case in java.

Do you have vtune to check?

Sent from mobile device

On Jan 25, 2012, at 2:48 PM, Aleksandar Prokopec <aleksandar.prokopec at gmail.com> wrote:

> Hello,
> 
> I've recently posted the following question about memory contention on StackOverflow:
> 
> http://stackoverflow.com/questions/8942396/array-allocation-and-access-on-the-java-virtual-machine-and-memory-contention
> 
> Basically, I have several threads defined like this:
> 
> final class Worker extends Thread {
>     Foo[] array = new Foo[1024];
>     int sz;
> 
>     public Worker(int _sz) {
>         sz = _sz;
>     }
> 
>     public void run() {
>         //Foo[] arr = new Foo[1024];
>         Foo[] arr = array;
>         loop(arr);
>     }
> 
>     public void loop(Foo[] arr) {
>         int i = 0;
>         int pos = 512;
>         Foo v = new Foo();
>         while (i < sz) {
>             if (i % 2 == 0) {
>                 arr[pos] = v;
>                 pos += 1;
>             } else {
>                 pos -= 1;
>                 v = arr[pos];
>             }
>             i++;
>         }
>     }
> }
> 
> The program takes 2 parameters - `size` and `par`. `size` is the size of the workload, and `par` is the number of threads to distribute the workload to.
> What each thread does is repetitively write and read an array location 512, in total `size`/ `par` times. Each thread reads its own array (locations being written to should be at least 1024*4 bytes away from each other).
> 
> I've measured the speedup on an 2x 4-core Intel Xeon  server and observed the following speedups for 1, 2, 4 and 8 processors (7 repetitions each):
> 
> >>> All running times: [2149, 2227, 1974, 1948, 1803, 2283, 1878]
> >>> All running times: [1140, 1124, 2022, 1141, 2028, 2004, 2136]
> >>> All running times: [867, 1022, 1457, 1342, 1436, 966, 1531]
> >>> All running times: [915, 864, 1245, 1243, 948, 790, 1007]
> 
> Apparently, this is due to memory contention.
> What I've observed is that if I comment the line `Foo[] arr = array` and uncomment the line above it to allocate the array within the thread, I get:
> 
> >>> All running times: [2053, 1966, 2089, 1937, 2046, 1909, 2011]
> >>> All running times: [1048, 1178, 1100, 1194, 1367, 1271, 1207]
> >>> All running times: [578, 508, 589, 571, 617, 643, 645]
> >>> All running times: [330, 299, 300, 322, 331, 324, 575]
> 
> My guess is that in the second case the arrays get allocated in thread local allocation buffers, so the array positions are much further from each other.
> Still, even in the first case processors should be writing to separate cache lines, so why does the first example scale so badly?
> 
> Can anyone explain these numbers? Why does this memory contention happen?
> 
> Thank you,
> Aleksandar Prokopec
> 
> 
> 
> ------------------------------------------------------------------------------
> 
> // The complete runnable program:
> 
> 
> import java.util.ArrayList;
> 
> class MultiStackJavaExperiment {
> 
>     final class Foo {
>         int x = 0;
>     }
> 
>     final class Worker extends Thread {
>         Foo[] array = new Foo[1024];
>         int sz;
> 
>         public Worker(int _sz) {
>             sz = _sz;
>         }
> 
>         public void run() {
>             Foo[] arr = new Foo[1024];
>             //Foo[] arr = array;
>             loop(arr);
>         }
> 
>         public void loop(Foo[] arr) {
>             int i = 0;
>             int pos = 512;
>             Foo v = new Foo();
>             while (i < sz) {
>                 if (i % 2 == 0) {
>                     arr[pos] = v;
>                     pos += 1;
>                 } else {
>                     pos -= 1;
>                     v = arr[pos];
>                 }
>                 i++;
>             }
>         }
>     }
> 
>     public static void main(String[] args) {
>         (new MultiStackJavaExperiment()).mainMethod(args);
>     }
> 
>     int size = Integer.parseInt(System.getProperty("size"));
>     int par = Integer.parseInt(System.getProperty("par"));
> 
>     public void mainMethod(String[] args) {
>         int times = 0;
>         if (args.length == 0) times = 1;
>         else times = Integer.parseInt(args[0]);
>         ArrayList < Long > measurements = new ArrayList < Long > ();
> 
>         for (int i = 0; i < times; i++) {
>             long start = System.currentTimeMillis();
>             run();
>             long end = System.currentTimeMillis();
> 
>             long time = (end - start);
>             System.out.println(i + ") Running time: " + time + " ms");
>             measurements.add(time);
>         }
> 
>         System.out.println(">>>");
>         System.out.println(">>> All running times: " + measurements);
>         System.out.println(">>>");
>     }
> 
>     public void run() {
>         int sz = size / par;
>         ArrayList < Thread > threads = new ArrayList < Thread > ();
> 
>         for (int i = 0; i < par; i++) {
>             threads.add(new Worker(sz));
>             threads.get(i).start();
>         }
>         for (int i = 0; i < par; i++) {
>             try {
>                 threads.get(i).join();
>             } catch (Exception e) {}
>         }
>     }
> 
> }
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120125/09baa3fc/attachment.html>


More information about the Concurrency-interest mailing list