[concurrency-interest] Thread priority

Nitsan Wakart nitsanw at yahoo.com
Thu Feb 7 15:20:41 EST 2013

This is a problem some people in the low latency space have had to find ways around by introducing all sorts of JVM warmup mechnisms to their systems. In the case of a static codebase/hardware I would think there is plenty to be won by utilizing the observations that went into JIT compiling the system before.
You may not be able to step in the same river twice, but you can learn something about rivers from your first visit :)
To put this in context consider a trading system where trades may be infrequent, but their latency is of high importance. Waiting for 10K trades to ly by before you get the performance you want is not so good. If I could combine the JIT profile with an AOT compiler to start from a better point the next time I start up I'd be very happy. This is similar to using table statistics to optimize a database table index.

 From: Nathan Reynolds <nathan.reynolds at oracle.com>
To: concurrency-interest at cs.oswego.edu 
Sent: Thursday, February 7, 2013 5:35 PM
Subject: Re: [concurrency-interest] Thread priority

> Why is there no way to serialize the performance statistics collected, or even the JIT compiled code and reload those on startup?

I've wondered the same thing.  Before I brought it up a couple of
      years ago, it had already been considered.  The problem is that it
      takes more time to verify that the classes are the same than it
      does to just compile it anew.  Because of inlining, the number of
      classes that have to be considered is very large for just one
      optimized method.

There are Ahead of Time compilers out there.  (See
      http://en.wikipedia.org/wiki/AOT_compiler)  AOT has its
      drawbacks... "AOT can't usually perform some optimizations
      possible in JIT, like runtime profile-guided optimizations,
      pseudo-constant propagation or indirect/virtual function
      inlining."  HotSpot depends upon these optimizations heavily to
      improve performance.

On the bright side, Class Data Sharing
      does help reduce start up time.  I think this feature is being
      enhanced, but I am not sure.

Nathan Reynolds | Architect | 602.333.9091
Oracle PSR Engineering | Server Technology

On 2/7/2013 10:17 AM, Nitsan Wakart wrote:

This is probably mostly relevant for server applications, where the usage and hardware are static and the same software is run for months on end. Why is there no way to serialize the performance statistics collected, or even the JIT compiled code and reload those on startup?
> From: Stanimir Simeonoff <stanimir at riflexo.com>
>To: gregg.wonderly at pobox.com 
>Cc: concurrency-interest at cs.oswego.edu 
>Sent: Thursday, February 7, 2013 4:56 PM
>Subject: Re: [concurrency-interest] Thread priority
>Besides the iteration counts the JVM (may) add profiling data. For instance: call sites. If the method is compiled w/ a single class only and then another is loaded the method has to be deoptimized and optimized again - obviously unwanted behavior. The same holds true for inline caches.
>To put it simply: w/o the profiling data the resulting
              machine code may not be of high quality. More also if the
              compiling threshold is too low - that would result in a
              lot of "code cache" used and it increases memory footprint
              + it may reach the memory limit for the code cache as many
              invocations do not reach c2 (10k). There was such a bug
              introduced w/ the tiered compilation in 6.0.25 - basically
              it reached the 32mb of code cache super fast - and a lot
              of code didn't reach C2 at all (or even c1 and remained
              interpreted). I had to increase the code cache limits to
              ~256mb (staring off 32mb) to reach to quota and decided to
              turn off tiered compilation altogether. There was a rare
              bug as well, replacing c1 w/ c2 code resulted in crashing
              the process.
>So, having low threshold is probably good for some
              microbenchmarks but not so much for real applications.
>On Thu, Feb 7, 2013 at 6:08 PM, Gregg Wonderly <gregg at cytetech.com> wrote:
>I've always wondered why these kinds of "large" invocation counts are used.  In many methods, there will be a single entry in most applications, and "loops" inside of that method could be optimized much sooner.  In many of my desktop applications, I set the invocation count (on the command line) to 100 or even 25, and get faster startups, and better performance for the small amount of time that I use the apps.  For the client VM, it really seems strange to wait so long (1000 invocations), to compile with instrumentation.  Then waiting for 10 times that many invocations to decide on the final optimizations seems a bit of a stretch.
>>Are there real data values from lots of different
                  users which indicate that these "counts" are when
                  people are ready to be more productive?  I know that
                  there are probably lots of degenerative cases where
                  optimizations will be missed without enough data.  But
                  it would seem better to go to native code early, and
                  adapt occasionally, rather than wait until you can be
                  sure to be perfect.
>>Gregg Wonderly 
>>On 2/7/2013 9:49 AM, Nathan Reynolds wrote:
>>With tiered compilation, once a method reaches 1,000 invocations
>>>(configurable?), it is compiled with
                      instrumentation.  Then when it reaches
>>>10,000 invocations (configurable), it is fully
                      optimized using the
>>>instrumentation profiling data.  For these
                      operations, the JIT threads should
>>>run at a higher priority.  However, there are some
                      optimizations which are too
>>>heavy to do at a high priority.  These
                      optimizations should be done at a low
>>>priority.  Also, methods, which haven't quite
                      reached the 1,000 invocations but
>>>are being execute, could be compiled with
                      instrumentation at a low priority.
>>>The low priority work will only be done if the CPU
                      isn't maxed out.  If any
>>>other thread needs the CPU, then the low priority
                      compiler thread will be
>>>immediately context switched off the core.  So,
                      the low priority compilation
>>>will never significantly hurt the performance of
                      the high priority threads.  For
>>>some work loads, the low priority threads may
                      never get a chance to run. That's
>>>okay because the work isn't that important.
Nathan Reynolds <http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> |
>>>Architect | 602.333.9091
>>>Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology 
>>>On 2/7/2013 12:21 AM, Stanimir Simeonoff wrote:
>>>Thread priorities are usually NOT applied at all.
>>>>For insance:
>>>>     intx DefaultThreadPriority                
                            = -1              {product}
>>>>    intx JavaPriority10_To_OSPriority          
                           = -1              {product}
>>>>     intx JavaPriority1_To_OSPriority          
                            = -1              {product}
>>>>     intx JavaPriority2_To_OSPriority          
                            = -1              {product}
>>>>     intx JavaPriority3_To_OSPriority          
                            = -1              {product}
>>>>     intx JavaPriority4_To_OSPriority          
                            = -1              {product}
>>>>     intx JavaPriority5_To_OSPriority          
                            = -1              {product}
>>>>     intx JavaPriority6_To_OSPriority          
                            = -1              {product}
>>>>     intx JavaPriority7_To_OSPriority          
                            = -1              {product}
>>>>     intx JavaPriority8_To_OSPriority          
                            = -1              {product}
>>>>     intx JavaPriority9_To_OSPriority          
                            = -1              {product}
>>>>in other words unless specified :
>>>>it won't be mapped.
>>>>If applied the JVM compiler/GC threads may
                        become starved which you don't
>>>>want, so they have to work above normal prir
                        (that request root privileges).
>>>>Alternatively the normal java threads have to
                        run w/ lower prir which means
>>>>other process will have higher priority - also
>>>>On Thu, Feb 7, 2013 at 5:20 AM, Mohan
>>>><radhakrishnan.mohan at gmail.com <mailto:radhakrishnan.mohan at gmail.com>> wrote:
>>>>    Hi,
>>>>              Can the Thread priority setting in
                        the API still be reliably
>>>>    used uniformly across processors ? There are
                        other concurrency patterns in
>>>>    the API but this setting is still there.
>>>>    Thanks,
>>>>    Mohan
>>>>    _______________________________________________
>>>>    Concurrency-interest mailing list
    Concurrency-interest at cs.oswego.edu <mailto:Concurrency-interest at cs.oswego.edu> 
>>>>    http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>Concurrency-interest mailing list
>>>>Concurrency-interest at cs.oswego.edu
>>>Concurrency-interest mailing list
>>>Concurrency-interest at cs.oswego.edu
>>Concurrency-interest mailing list
>>Concurrency-interest at cs.oswego.edu
>Concurrency-interest mailing list
>Concurrency-interest at cs.oswego.edu
Concurrency-interest mailing list Concurrency-interest at cs.oswego.edu http://cs.oswego.edu/mailman/listinfo/concurrency-interest 

Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130207/609639ae/attachment-0001.html>

More information about the Concurrency-interest mailing list