[concurrency-interest] spinLoopHint() JEP draft discussion
boehm at acm.org
Tue Oct 13 21:38:56 EDT 2015
It seems to me that the trick here is to be explicit as to what is
intended. Presumably this is intended to discourage speculative execution
across a spinLoopHint(). It is not intended to, for example, put the
processor into some sort of sleep state for a while, though that might also
make sense under slightly different circumstances.
I would emphasize that this is expected not to increase latency. It might
happen to reduce power consumption, but a power-reducing,
latency-increasing implementation is not expected.
On Sat, Oct 10, 2015 at 8:41 AM, Gil Tene <gil at azul.com> wrote:
> On Oct 8, 2015, at 10:50 AM, Hans Boehm <boehm at acm.org> wrote:
> My question about spinLoopHint() would be whether it can be defined in a
> way that it makes it useful across architectures. I vaguely remember
> seeing claims that even the x86 instructions are not implemented
> consistently enough to be easily usable in portable code.
> The PAUSE instruction on x86 has been around and used consistently since
> Pentium 4s. And pretty much anything spinning (including the JVM's own C++
> spinning code) uses it across all x86 architectures. (It encodes in a way
> that makes it a NOP for pre-Pentium 4 x86, so its harmless at worst).
> I have no idea (though I probably should) about ARM equivalents or the
> It does not seem to be common practice to use a pure spin loop hinting
> instruction on ARM in spin loops. On ARMv8 (64 bit) spinning uses WFE/SEVL
> instructions, which do more than hint. They actually watch a specific
> memory location for change. See discussion in several e-mails on the thread
> with the same subject on OpenJDK core-libs-dev archives about that.
> It also seems to me that unbounded spin loops are almost always a bad idea.
> The hidden OS guy in me always feels that way. But in today's many-core
> world it is hard to argue with the many practical uses of dedicated and
> unbounded user-mode spinning. From kernel bypass networking stacks to
> messaging stacks to trading applications, it is VERY common to find a
> server continually spinning on a handful of cores these days. And it
> provides metric benefits to the applications that do so. These include many
> applications written in (and doing their spinning logic) in Java.
> (If you've been spinning for 10 seconds, you should be sleeping instead.
> Not if what you care about is the reaction time to the next message. Many
> applications care about latency (sometimes down to the sub-usec levels)
> even when messages only come in at 100/sec. And unbounded spinning improves
> latency across the board (not just the long tails, but even the medium) for
> such use cases.
> You might even be inadvertently scheduled against the thread you're
> waiting for.
> That's what is always dangerous about user-mode spinning (even the bounded
> kind). But there are many practical ways to prevent this from happening (or
> prevent it "enough") on modern many-core machines. Just keeping your active
> thread counts well below your vcore count is a pretty simple way to start
> for this, and with a modern 2 socket x86 server having anywhere from 24 to
> 72 vcores these days, thats a pretty practical thing to do. The true
> latency sensitive folks out there will do a lot to control which cores they
> spin on, and who might interfere with those cores (e.g. see this detailed
> Strageloop presentation by Mark Price from LMAX:
> https://www.youtube.com/watch?v=-6nrhSdu--s (discussion of core-affiny
> controls starts around 16:00 in the video). LMAX do a lot of spinning in
> Since you're waiting anyway, you might as well keep track of how long
> you've been spinning.) But the idea here would be that this is the
> low-level primitive you use if you haven't been spinning for very long?
> A spinHintLoop is useful for both short spinning (spinning for a while
> before giving up and blocking) and in indefinite spinning, nd both cases
> will benefit from it.
> The alternative is to pass in some indication of how long you've been
> spinning, and have this yield, or sleep, after a sufficiently long time.
> I don't see much urgency for adding convenience wrappers, as this logic is
> doable without adding a Java SE APIs. In fact, it is common to see this in
> code that performs some sort of indefinite spinning logic.
> spinLoopHint() is needed because it provides a currently missing feature.
> Without it there is (currently) no way for Java spinning logic to make use
> of important hardware capabilities that improve execution metrics (latency,
> power consumption, and overall program throughout). Those capabilities are
> in near-universal use outside of Java for good reason, and Java just lacks
> a way to indicate the need or intent in a practical way (and JNI call or a
> yield() is not practical due to the dramatic relative cost difference)...
> On Tue, Oct 6, 2015 at 6:41 PM, Gil Tene <gil at azulsystems.com> wrote:
>> When comparing spinLoopHint() to Thread.yield(), we're talking about
>> different orders of magnitude, and different motivations.
>> On the motivation side: A major reason for using spinLoopHint() is to
>> improve the reaction time of a spinning thread (from the time the event it
>> is spinning for actually occurs until it actually reacts to it). Power
>> savings is a another benefit. Thread.yield() doesn't help with either.
>> On the orders of magnitude side: Thread.yield involves making a system
>> call. This makes it literally 10x+ longer to react than spinning without
>> it, and certainly pulls in the opposite direction of spinLoopHint().
>> On Oct 6, 2015, at 1:15 PM, Nathan Reynolds <nathan.reynolds at oracle.com>
>> I am not fully up to speed on this topic. However, why not call
>> Thread.yield()? If there are no other threads waiting to get on the
>> processor, then Thread.yield() does nothing. The current thread keeps
>> executing. If there are threads waiting to get on the processor, then
>> current thread goes to the end of the run queue and another thread gets on
>> the processor (i.e. a context switch). The thread will run again after the
>> other threads ahead of it either block, call yield() or use up their time
>> slice. The only time Thread.yield() will do anything is if *all* of the
>> processors are busy (i.e. 100% CPU utilization for the machine). You could
>> run 1000s of threads in tight Thread.yield() loops and all of the threads
>> will take a turn to go around the loop one time and then go to the end of
>> the run queue.
>> I've tested this on Windows and Linux (Intel 64-bit processors).
>> Some people are very afraid of context switches. They think that context
>> switches are expensive. This was true of very old Linux kernels. Now a
>> days, it costs 100s of nanoseconds to do a context switch. Of course, the
>> cache may need to be reloaded with the data relevant for the running thread.
>> On 10/6/2015 11:56 AM, Gil Tene wrote:
>> A variant of synchronic for j.u.c would certainly be cool to have.
>> Especially if it supports a hint that makes it actually spin forever rather
>> than block (this may be what expect_urgent means, or maybe a dedicated spin
>> level is needed). An implementation could use spinLoopHint() under the
>> hood, or other things where appropriate (e.g. if MWAIT was usefully
>> available in user mode in some future, and had a way to limit the wait
>> However, an abstraction like synchronic is a bit higher level than
>> spinLoopHint(). One of the main drivers for spinLoopHint() is direct-use
>> cases by programs and libraries outside of the core JDK. E.g. spinning
>> indefinitely (or for limited periods) on dedicated vcores is a common
>> practice in high performance messaging and communications stacks, as is not
>> unreasonable on today's many-core systems. E.g. seeing 4-8 threads "pinned"
>> with spinning loops is common place in trading applications, in kernel
>> bypass network stacks, and in low latency messaging. And the conditions for
>> spins are often more complicated than those expressible by synchronic (e.g.
>> watching multiple addresses in a mux'ed spin). I'm sure a higher level
>> abstraction for a spin wait can be enriched enough to come close, but there
>> are many current use cases that aren't covered by any currently proposed
>> So, I like the idea of an abstraction that would allow uncomplicated
>> spin-wait use, but I also think that direct access to spinLoopHint() is
>> very much needed. They don't contradict each other.
>> — Gil.
>> On Oct 6, 2015, at 9:49 AM, Hans Boehm < <boehm at acm.org>boehm at acm.org>
>> If you haven't seen it, you may also be interested in
>> which seems to be a very different perspective on roughly the same space.
>> On Tue, Oct 6, 2015 at 8:11 AM, Gil Tene < <gil at azulsystems.com>
>> gil at azulsystems.com> wrote:
>>> I posted a draft JEP about adding spinLoopHint() for discussion on
>>> core-libs-dev and hotspot-dev. May be of interest to this group. The main
>>> focus is supporting outside-of-the-JDK spinning needs (for which there are
>>> multiple eager users), but it could/may be useful under the hood in j.u.c.
>>> See draft JEP, tests, and links to prototype JDKs to play with here:
>>> — Gil.
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>> Concurrency-interest mailing listConcurrency-interest at cs.oswego.eduhttp://cs.oswego.edu/mailman/listinfo/concurrency-interest
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest