[concurrency-interest] Array allocation and access on the JVM

Doug Lea dl at cs.oswego.edu
Wed Jan 25 18:55:38 EST 2012


On 01/25/12 14:48, Aleksandar Prokopec wrote:
> Hello,
>
> I've recently posted the following question about memory contention on
> StackOverflow:


I asked Aleksandar offlist to check the effects of -XX:+UseCondCardMark
which as always these days was the main problem. Really, everyone
should always use this switch on hotspot on multiprocessors.
(or -XX:+UseG1GC, although it is often a little worse on throughput
but better-behaved for large heaps).

As I mentioned in a post from a few weeks ago, see Dive Dice's blog
http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card


-Doug





>
> http://stackoverflow.com/questions/8942396/array-allocation-and-access-on-the-java-virtual-machine-and-memory-contention
>
> Basically, I have several threads defined like this:
>
> final class Worker extends Thread {
> Foo[] array = new Foo[1024];
> int sz;
>
> public Worker(int _sz) {
> sz = _sz;
> }
>
> public void run() {
> //Foo[] arr = new Foo[1024];
> Foo[] arr = array;
> loop(arr);
> }
>
> public void loop(Foo[] arr) {
> int i = 0;
> int pos = 512;
> Foo v = new Foo();
> while (i < sz) {
> if (i % 2 == 0) {
> arr[pos] = v;
> pos += 1;
> } else {
> pos -= 1;
> v = arr[pos];
> }
> i++;
> }
> }
> }
>
> The program takes 2 parameters - `size` and `par`. `size` is the size of the
> workload, and `par` is the number of threads to distribute the workload to.
> What each thread does is repetitively write and read an array location 512, in
> total `size`/ `par` times. Each thread reads its own array (locations being
> written to should be at least 1024*4 bytes away from each other).
>
> I've measured the speedup on an 2x 4-core Intel Xeon server and observed the
> following speedups for 1, 2, 4 and 8 processors (7 repetitions each):
>
>  >>> All running times: [2149, 2227, 1974, 1948, 1803, 2283, 1878]
>  >>> All running times: [1140, 1124, 2022, 1141, 2028, 2004, 2136]
>  >>> All running times: [867, 1022, 1457, 1342, 1436, 966, 1531]
>  >>> All running times: [915, 864, 1245, 1243, 948, 790, 1007]
>
> Apparently, this is due to memory contention.
> What I've observed is that if I comment the line `Foo[] arr = array` and
> uncomment the line above it to allocate the array within the thread, I get:
>
>  >>> All running times: [2053, 1966, 2089, 1937, 2046, 1909, 2011]
>  >>> All running times: [1048, 1178, 1100, 1194, 1367, 1271, 1207]
>  >>> All running times: [578, 508, 589, 571, 617, 643, 645]
>  >>> All running times: [330, 299, 300, 322, 331, 324, 575]
>
> My guess is that in the second case the arrays get allocated in thread local
> allocation buffers, so the array positions are much further from each other.
> Still, even in the first case processors should be writing to separate cache
> lines, so why does the first example scale so badly?
>
> Can anyone explain these numbers? Why does this memory contention happen?
>
> Thank you,
> Aleksandar Prokopec
>
>
>
> ------------------------------------------------------------------------------
>
> // The complete runnable program:
>
>
> import java.util.ArrayList;
>
> class MultiStackJavaExperiment {
>
> final class Foo {
> int x = 0;
> }
>
> final class Worker extends Thread {
> Foo[] array = new Foo[1024];
> int sz;
>
> public Worker(int _sz) {
> sz = _sz;
> }
>
> public void run() {
> Foo[] arr = new Foo[1024];
> //Foo[] arr = array;
> loop(arr);
> }
>
> public void loop(Foo[] arr) {
> int i = 0;
> int pos = 512;
> Foo v = new Foo();
> while (i < sz) {
> if (i % 2 == 0) {
> arr[pos] = v;
> pos += 1;
> } else {
> pos -= 1;
> v = arr[pos];
> }
> i++;
> }
> }
> }
>
> public static void main(String[] args) {
> (new MultiStackJavaExperiment()).mainMethod(args);
> }
>
> int size = Integer.parseInt(System.getProperty("size"));
> int par = Integer.parseInt(System.getProperty("par"));
>
> public void mainMethod(String[] args) {
> int times = 0;
> if (args.length == 0) times = 1;
> else times = Integer.parseInt(args[0]);
> ArrayList < Long > measurements = new ArrayList < Long > ();
>
> for (int i = 0; i < times; i++) {
> long start = System.currentTimeMillis();
> run();
> long end = System.currentTimeMillis();
>
> long time = (end - start);
> System.out.println(i + ") Running time: " + time + " ms");
> measurements.add(time);
> }
>
> System.out.println(">>>");
> System.out.println(">>> All running times: " + measurements);
> System.out.println(">>>");
> }
>
> public void run() {
> int sz = size / par;
> ArrayList < Thread > threads = new ArrayList < Thread > ();
>
> for (int i = 0; i < par; i++) {
> threads.add(new Worker(sz));
> threads.get(i).start();
> }
> for (int i = 0; i < par; i++) {
> try {
> threads.get(i).join();
> } catch (Exception e) {}
> }
> }
>
> }
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest



More information about the Concurrency-interest mailing list