[concurrency-interest] The very best CAS loop
vitalyd at gmail.com
Thu Sep 29 10:28:28 EDT 2016
On Thursday, September 29, 2016, Andrew Haley <aph at redhat.com> wrote:
> On 29/09/16 14:53, Vitaly Davidovich wrote:
> > On Thursday, September 29, 2016, Andrew Haley <aph at redhat.com
> >> On 29/09/16 02:26, Vitaly Davidovich wrote:
> >>> Alas, it’s tricky to do better while retaining safe access, there
> >> are:
> >>> - bounds checks;
> >>> - read only checks;
> >>> - alignment checks; and
> >>> - that the buffer is effectively a box of the address (base &
> >>> offset, where base == null for off-heap).
> >>> When looping the checks can be hoisted, and unrolling should
> >>> result in efficient addressing. So, e.g. for plain access, a the
> >>> generated hot-loop is similar to that as if Unsafe was directly
> >>> used.
> >>> So maybe we do need Unsafe after all? :) I mean it's fairly clear
> >>> that safety checks can only be amortized when compiler is dealing
> >>> with them in bulk - should work well for loops (assuming inlining
> >>> doesn't fail), although OSR wouldn't I think.
> >>> But what about non-loop cases or when compiler's compilation horizon
> >>> doesn't see the loop? Other languages have unsafe escape hatches for
> >>> when you really want to subvert the system because "you know
> >>> better".
> >> Well, some do. I'd argue that you need high-speed access to raw
> >> memory in the cases when you have a lot of bulk data. And in those
> >> cases a skilled programmer can work with JVM to make sure that the
> >> compiler gets what it needs to do a good job.
> > This argument leans heavily on the Sufficiently Smart Compiler fallacy.
> > particular, "bulk data" or looping might be hidden from compiler's
> > optimization horizon.
> Perhaps, but if so I would argue that getting good performance out of
> HotSpot for anything relies on the Sufficiently Smart Compiler
That's exactly right. But that's why escape hatches are needed to ensure
you get the codegen you expect, always. Let's also not forget that due to
PGO and the associated heuristics in the compiler, codegen can differ run
to run. This is typically not seen in microbenchmarks because they run
with a "clean" profile, have a fairly narrow set of CFGs, methods don't
typically fail to inline because they were inlined elsewhere and native
code is too big (i.e. InlineSmallCode cutoffs), etc.
Don't get me wrong - I think Hotspot is tremendous piece of engineering,
and it does a great job optimizing an otherwise performance-anemic
execution model. But inevitably you need ways to take matters into your
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest