[concurrency-interest] A beginner question (on fork-and-join)
nathan.reynolds at oracle.com
Mon Nov 21 18:29:00 EST 2011
The JVM does profile-guided optimization. If you reduce the warm up to
only 100 invocations, then the JVM only looks at those 100 samples and
determines how to optimize the method. I would guess that for some
methods 100, or 1000 or 10000 invocations isn't going to make any
difference on the optimized code. However, other methods need the full
10,000 invocations in order to fully understand how the method is used
and the best way to optimize it.
In production, you could start one JVM with 100 invocations and the
other with the default. If both JVMs have the same CPU usage, response
times and throughput after warmup and compilation, then 100 invocations
is sufficient for your workload. I would guess that the one with 100
invocations will suffer.
I'm not sure, but I believe HotSpot 7 includes a tiered compilation.
After 1,000 invocations, the method is deemed hot enough that the JVM
optimizes it without any profiling data to guide the optimizations. The
JVM adds profiling code to the method at this time. After 10,000
invocations, the JVM does the profile-guided optimization of the method.
On a heavily used server, 10,000 invocations should happen very
quickly. For some servers, they will process that many requests per
second or even sub-second. So, the question becomes does the first few
minutes of execution really matter considering the lifespan of the
server? In the overall picture, the start up time is much less than 1%
of the total time the server is running.
For client applications, this is a much different story. The 10,000
invocation won't be reached until the user presses a button 10,000
times. However, _some_ of the time the response time of the program
isn't critical. The response time for fully optimized code might be 1
ms. With unoptimized code it might be 10 ms. The user may not be able
to notice. For example, the older flat-panel monitors refresh at 60 Hz
(= 16.6 ms). So, if the program responds within 16.6 ms, the user may
not even be able to see that it took a bit longer.
However, I hear your pain. I wish there were a good way to have instant
warm up. I and several others have given this problem a lot of
thought. All of the schemes we have come up with have a lot of issues
and were rejected flat-out or were tried and then rejected due to
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
On 11/21/2011 3:36 PM, Gregg Wonderly wrote:
> So I have to ask, why don't you use the command line property to
> change this to something like 100 for a faster warm up? For some of
> my applications, doing this reduces startup time by orders of
> magnitude because of the number of times some things are invoked. In
> particular, server applications using a security manager seem to start
> much faster.
> Gregg Wonderly
> On 11/21/2011 3:40 PM, Nathan Reynolds wrote:
>> Microbenchmarks are incredibly hard to get right. For example,
>> HotSpot 7 JVM
>> won't do a full optimization of a method until 10,000 invocations.
>> You need to
>> bump up the priority of the test thread so that other things on the
>> system don't
>> add noise. These probably aren't applicable to your case, but you may
>> to force a
>> full GC right before running the test.
>> You probably want to use http://code.google.com/p/caliper/ which
>> deals with all
>> of these gotchas.
>> Nathan Reynolds
>> <http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> |
>> Consulting Member of Technical Staff | 602.333.9091
>> Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
>> On 11/21/2011 2:15 PM, David Harrigan wrote:
>>> Hi Everyone,
>>> I'm learning about the fork and join framework in JDK7 and to test it
>>> out I wrote a little program that tries to find a number at the end of
>>> a list with 50,000 elements.
>>> What puzzles me is when I run the "find" in a sequential fashion, it
>>> returns faster than if I use a fork-and-join implementation. I'm
>>> running each "find" 5000 times
>>> so as to "warm" up the JVM. I've got a timing listed below:
>>> Generating some data...done!
>>> Simon Stopwatch: total 1015 s, counter 5000, max 292 ms, min 195 ms,
>>> mean 203 ms [sequential INHERIT]
>>> Simon Stopwatch: total 1352 s, counter 5000, max 4.70 s, min 243 ms,
>>> mean 270 ms [parallel INHERIT]
>>> (some runtime information)
>>> openjdk version "1.7.0-ea"
>>> OpenJDK Runtime Environment (build 1.7.0-ea-b215)
>>> OpenJDK 64-Bit Server VM (build 21.0-b17, mixed mode)
>>> 2.66Mhz Intel Core i7 with 8GB RAM (256KB L2 cache per core (4 cores)
>>> and 4MB L3 cache) running on a MBP (Lion 10.7.2)
>>> Forgive my ignorance but this type of programming is still quite new
>>> to me and I'm obviously doing something wrong, but I don't know what.
>>> My suspicion is
>>> something to do with spinning up and down threads and the overhead
>>> that entails. I've posted the src herehttp://pastebin.com/p96R24R0.
>>> My sincere apologies if this list is not appropriate for this posting,
>>> if so I would welcome a pointer on where I can find more information
>>> to help me understand
>>> better the behaviour of my program when using F&J.
>>> I thought that by using F&J I would be able to find the answer quicker
>>> than doing the searching sequentially, perhaps I've choosen a wrong
>>> initial problem to
>>> test this out (something that is suited to a sequential search and not
>>> a parallel search?)
>>> Thank you all in advance.
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest