[concurrency-interest] Array allocation and access on the JVM
david.dice at gmail.com
Fri Jan 27 12:10:20 EST 2012
Date: Fri, 27 Jan 2012 09:29:17 -0700
> From: Nathan Reynolds <nathan.reynolds at oracle.com>
> To: Hanson Char <hanson.char at gmail.com>
> Cc: Doug Lea <dl at cs.oswego.edu>, concurrency-interest at cs.oswego.edu
> Subject: Re: [concurrency-interest] Array allocation and access on the
> Message-ID: <4F22D0DD.6020501 at oracle.com>
> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
> Defaulting -XX:+UseCondCardMark would require a lot of testing to create
> a heuristic to guess when it should default on or off. Not all
> workloads hit contention on the card table. Making the card table
> concurrent for these workloads would slow them down. If the heuristic
> sub-optimal, then we would be in a worse situation than we are today.
> In order for the JVM to detect contention, it would need to profile each
> access or have a hardware counter to declare contention. Timing each
> access and tracking statistics would be expensive. Hardware counters
> exist but they are too expensive to access because of kernel round
> trips. So, the current solutions are more expensive than the actual
> We have asked hardware vendors to supply user-mode access to hardware
> counters for true and false sharing. The same hardware counters can be
> used for this problem. If we get the user-mode hardware counters, then
> this might be automatize.
> Another difficulty would be in making this flag dynamically changeable
> _without_ incurring a performance penalty.
The decision is platform-specific as well as load-specific. Using CCM
won't be profitable on a single-core T2+, for instance, where false sharing
is particularly cheap. Similarly, if you have the system partitioned into
small-diameter domains then it's more apt to be unprofitable.
Currently, the flag exists as a diagnostic tool to help us identify how
wide-spread the issue is, as well as the magnitude of the performance drop.
Also, the current CCM idiom could use improvement before we try to make
decisions. It could certainly be made faster. Some branch-free forms
have a slightly longer path but appear to perform better.
p.s., if we were to have good hardware profiling (say, like AMD's LWP that
disintermediated the OS) then we might be able to sample and adapt by
individual card marking site, instead of trying to turn the feature on/off
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest