[concurrency-interest] Array allocation and access on the JVM

Nathan Reynolds nathan.reynolds at oracle.com
Fri Jan 27 11:29:17 EST 2012


Defaulting -XX:+UseCondCardMark would require a lot of testing to create 
a heuristic to guess when it should default on or off.  Not all 
workloads hit contention on the card table.  Making the card table 
concurrent for these workloads would slow them down.  If the heuristic 
sub-optimal, then we would be in a worse situation than we are today.

In order for the JVM to detect contention, it would need to profile each 
access or have a hardware counter to declare contention.  Timing each 
access and tracking statistics would be expensive.  Hardware counters 
exist but they are too expensive to access because of kernel round 
trips.  So, the current solutions are more expensive than the actual 
problem.

We have asked hardware vendors to supply user-mode access to hardware 
counters for true and false sharing.  The same hardware counters can be 
used for this problem.  If we get the user-mode hardware counters, then 
this might be automatize.

Another difficulty would be in making this flag dynamically changeable 
_without_ incurring a performance penalty.

Nathan Reynolds 
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> | 
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology

On 1/26/2012 10:36 PM, Hanson Char wrote:
> Is there plan to have -XX:+UseCondCardMark included as the default 
> config in Java 7 ?  Or can a VM auto detect the need for such 
> configuration and self config upon start up ?
>
> Thanks,
> Hanson
>
> On Wed, Jan 25, 2012 at 3:55 PM, Doug Lea <dl at cs.oswego.edu 
> <mailto:dl at cs.oswego.edu>> wrote:
>
>     On 01/25/12 14:48, Aleksandar Prokopec wrote:
>
>         Hello,
>
>         I've recently posted the following question about memory
>         contention on
>         StackOverflow:
>
>
>
>     I asked Aleksandar offlist to check the effects of
>     -XX:+UseCondCardMark
>     which as always these days was the main problem. Really, everyone
>     should always use this switch on hotspot on multiprocessors.
>     (or -XX:+UseG1GC, although it is often a little worse on throughput
>     but better-behaved for large heaps).
>
>     As I mentioned in a post from a few weeks ago, see Dive Dice's blog
>     http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card
>
>
>     -Doug
>
>
>
>
>
>
>         http://stackoverflow.com/questions/8942396/array-allocation-and-access-on-the-java-virtual-machine-and-memory-contention
>
>         Basically, I have several threads defined like this:
>
>         final class Worker extends Thread {
>         Foo[] array = new Foo[1024];
>         int sz;
>
>         public Worker(int _sz) {
>         sz = _sz;
>         }
>
>         public void run() {
>         //Foo[] arr = new Foo[1024];
>         Foo[] arr = array;
>         loop(arr);
>         }
>
>         public void loop(Foo[] arr) {
>         int i = 0;
>         int pos = 512;
>         Foo v = new Foo();
>         while (i < sz) {
>         if (i % 2 == 0) {
>         arr[pos] = v;
>         pos += 1;
>         } else {
>         pos -= 1;
>         v = arr[pos];
>         }
>         i++;
>         }
>         }
>         }
>
>         The program takes 2 parameters - `size` and `par`. `size` is
>         the size of the
>         workload, and `par` is the number of threads to distribute the
>         workload to.
>         What each thread does is repetitively write and read an array
>         location 512, in
>         total `size`/ `par` times. Each thread reads its own array
>         (locations being
>         written to should be at least 1024*4 bytes away from each other).
>
>         I've measured the speedup on an 2x 4-core Intel Xeon server
>         and observed the
>         following speedups for 1, 2, 4 and 8 processors (7 repetitions
>         each):
>
>         >>> All running times: [2149, 2227, 1974, 1948, 1803, 2283, 1878]
>         >>> All running times: [1140, 1124, 2022, 1141, 2028, 2004, 2136]
>         >>> All running times: [867, 1022, 1457, 1342, 1436, 966, 1531]
>         >>> All running times: [915, 864, 1245, 1243, 948, 790, 1007]
>
>         Apparently, this is due to memory contention.
>         What I've observed is that if I comment the line `Foo[] arr =
>         array` and
>         uncomment the line above it to allocate the array within the
>         thread, I get:
>
>         >>> All running times: [2053, 1966, 2089, 1937, 2046, 1909, 2011]
>         >>> All running times: [1048, 1178, 1100, 1194, 1367, 1271, 1207]
>         >>> All running times: [578, 508, 589, 571, 617, 643, 645]
>         >>> All running times: [330, 299, 300, 322, 331, 324, 575]
>
>         My guess is that in the second case the arrays get allocated
>         in thread local
>         allocation buffers, so the array positions are much further
>         from each other.
>         Still, even in the first case processors should be writing to
>         separate cache
>         lines, so why does the first example scale so badly?
>
>         Can anyone explain these numbers? Why does this memory
>         contention happen?
>
>         Thank you,
>         Aleksandar Prokopec
>
>
>
>         ------------------------------------------------------------------------------
>
>         // The complete runnable program:
>
>
>         import java.util.ArrayList;
>
>         class MultiStackJavaExperiment {
>
>         final class Foo {
>         int x = 0;
>         }
>
>         final class Worker extends Thread {
>         Foo[] array = new Foo[1024];
>         int sz;
>
>         public Worker(int _sz) {
>         sz = _sz;
>         }
>
>         public void run() {
>         Foo[] arr = new Foo[1024];
>         //Foo[] arr = array;
>         loop(arr);
>         }
>
>         public void loop(Foo[] arr) {
>         int i = 0;
>         int pos = 512;
>         Foo v = new Foo();
>         while (i < sz) {
>         if (i % 2 == 0) {
>         arr[pos] = v;
>         pos += 1;
>         } else {
>         pos -= 1;
>         v = arr[pos];
>         }
>         i++;
>         }
>         }
>         }
>
>         public static void main(String[] args) {
>         (new MultiStackJavaExperiment()).mainMethod(args);
>         }
>
>         int size = Integer.parseInt(System.getProperty("size"));
>         int par = Integer.parseInt(System.getProperty("par"));
>
>         public void mainMethod(String[] args) {
>         int times = 0;
>         if (args.length == 0) times = 1;
>         else times = Integer.parseInt(args[0]);
>         ArrayList < Long > measurements = new ArrayList < Long > ();
>
>         for (int i = 0; i < times; i++) {
>         long start = System.currentTimeMillis();
>         run();
>         long end = System.currentTimeMillis();
>
>         long time = (end - start);
>         System.out.println(i + ") Running time: " + time + " ms");
>         measurements.add(time);
>         }
>
>         System.out.println(">>>");
>         System.out.println(">>> All running times: " + measurements);
>         System.out.println(">>>");
>         }
>
>         public void run() {
>         int sz = size / par;
>         ArrayList < Thread > threads = new ArrayList < Thread > ();
>
>         for (int i = 0; i < par; i++) {
>         threads.add(new Worker(sz));
>         threads.get(i).start();
>         }
>         for (int i = 0; i < par; i++) {
>         try {
>         threads.get(i).join();
>         } catch (Exception e) {}
>         }
>         }
>
>         }
>
>
>
>         _______________________________________________
>         Concurrency-interest mailing list
>         Concurrency-interest at cs.oswego.edu
>         <mailto:Concurrency-interest at cs.oswego.edu>
>         http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>     _______________________________________________
>     Concurrency-interest mailing list
>     Concurrency-interest at cs.oswego.edu
>     <mailto:Concurrency-interest at cs.oswego.edu>
>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120127/37c20d50/attachment-0001.html>


More information about the Concurrency-interest mailing list