[concurrency-interest] Array allocation and access on the JVM

Vitaly Davidovich vitalyd at gmail.com
Wed Jan 25 17:41:48 EST 2012


Perhaps the JVM prefetches additional cache lines which then need to be
invalidated by coherence once other cores start writing to them.  In the
example where one thread allocates all of the arrays upfront, they'll be
contiguous.  Once that memory is written by other cores it may be that even
though a given core is writing to its own cache line, it may have other
cache lines in its cache due to prefetch, which will be invalidated by some
other core that's writing to that line.  Multiply this by all your cores
and perhaps you get inadvertent sharing.

With TLAB allocation memory is spread out more, as you suggest, so it
avoids this type of problem.   If you increase the array size as you did
then perhaps the distance between the memory is beyond prefetch boundary
and you get scaling (though with memory waste); the larger size may even
bypass TLAB altogether, but it doesn't matter due to distance.

Just a guess though ... you can also try asking the GC group for insight.

Sent from my phone
On Jan 25, 2012 2:51 PM, "Aleksandar Prokopec" <
aleksandar.prokopec at gmail.com> wrote:

>  Hello,
>
> I've recently posted the following question about memory contention on
> StackOverflow:
>
>
> http://stackoverflow.com/questions/8942396/array-allocation-and-access-on-the-java-virtual-machine-and-memory-contention
>
> Basically, I have several threads defined like this:
>
> final class Worker extends Thread {
>     Foo[] array = new Foo[1024];
>     int sz;
>
>     public Worker(int _sz) {
>         sz = _sz;
>     }
>
>     public void run() {
>         //Foo[] arr = new Foo[1024];
>         Foo[] arr = array;
>         loop(arr);
>     }
>
>     public void loop(Foo[] arr) {
>         int i = 0;
>         int pos = 512;
>         Foo v = new Foo();
>         while (i < sz) {
>             if (i % 2 == 0) {
>                 arr[pos] = v;
>                 pos += 1;
>             } else {
>                 pos -= 1;
>                 v = arr[pos];
>             }
>             i++;
>         }
>     }
> }
>
> The program takes 2 parameters - `size` and `par`. `size` is the size of
> the workload, and `par` is the number of threads to distribute the workload
> to.
> What each thread does is repetitively write and read an array location
> 512, in total `size`/ `par` times. Each thread reads its own array
> (locations being written to should be at least 1024*4 bytes away from each
> other).
>
> I've measured the speedup on an 2x 4-core Intel Xeon  server and observed
> the following speedups for 1, 2, 4 and 8 processors (7 repetitions each):
>
> >>> All running times: [2149, 2227, 1974, 1948, 1803, 2283, 1878]
> >>> All running times: [1140, 1124, 2022, 1141, 2028, 2004, 2136]
> >>> All running times: [867, 1022, 1457, 1342, 1436, 966, 1531]
> >>> All running times: [915, 864, 1245, 1243, 948, 790, 1007]
>
> Apparently, this is due to memory contention.
> What I've observed is that if I comment the line `Foo[] arr = array` and
> uncomment the line above it to allocate the array within the thread, I get:
>
> >>> All running times: [2053, 1966, 2089, 1937, 2046, 1909, 2011]
> >>> All running times: [1048, 1178, 1100, 1194, 1367, 1271, 1207]
> >>> All running times: [578, 508, 589, 571, 617, 643, 645]
> >>> All running times: [330, 299, 300, 322, 331, 324, 575]
>
> My guess is that in the second case the arrays get allocated in thread
> local allocation buffers, so the array positions are much further from each
> other.
> Still, even in the first case processors should be writing to separate
> cache lines, so why does the first example scale so badly?
>
> Can anyone explain these numbers? Why does this memory contention happen?
>
> Thank you,
> Aleksandar Prokopec
>
>
>
>
> ------------------------------------------------------------------------------
>
> // The complete runnable program:
>
>
> import java.util.ArrayList;
>
> class MultiStackJavaExperiment {
>
>     final class Foo {
>         int x = 0;
>     }
>
>     final class Worker extends Thread {
>         Foo[] array = new Foo[1024];
>         int sz;
>
>         public Worker(int _sz) {
>             sz = _sz;
>         }
>
>         public void run() {
>             Foo[] arr = new Foo[1024];
>             //Foo[] arr = array;
>             loop(arr);
>         }
>
>         public void loop(Foo[] arr) {
>             int i = 0;
>             int pos = 512;
>             Foo v = new Foo();
>             while (i < sz) {
>                 if (i % 2 == 0) {
>                     arr[pos] = v;
>                     pos += 1;
>                 } else {
>                     pos -= 1;
>                     v = arr[pos];
>                 }
>                 i++;
>             }
>         }
>     }
>
>     public static void main(String[] args) {
>         (new MultiStackJavaExperiment()).mainMethod(args);
>     }
>
>     int size = Integer.parseInt(System.getProperty("size"));
>     int par = Integer.parseInt(System.getProperty("par"));
>
>     public void mainMethod(String[] args) {
>         int times = 0;
>         if (args.length == 0) times = 1;
>         else times = Integer.parseInt(args[0]);
>         ArrayList < Long > measurements = new ArrayList < Long > ();
>
>         for (int i = 0; i < times; i++) {
>             long start = System.currentTimeMillis();
>             run();
>             long end = System.currentTimeMillis();
>
>             long time = (end - start);
>             System.out.println(i + ") Running time: " + time + " ms");
>             measurements.add(time);
>         }
>
>         System.out.println(">>>");
>         System.out.println(">>> All running times: " + measurements);
>         System.out.println(">>>");
>     }
>
>     public void run() {
>         int sz = size / par;
>         ArrayList < Thread > threads = new ArrayList < Thread > ();
>
>         for (int i = 0; i < par; i++) {
>             threads.add(new Worker(sz));
>             threads.get(i).start();
>         }
>         for (int i = 0; i < par; i++) {
>             try {
>                 threads.get(i).join();
>             } catch (Exception e) {}
>         }
>     }
>
> }
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120125/b0a4cdae/attachment-0001.html>


More information about the Concurrency-interest mailing list