[concurrency-interest] Java Memory Model and ParallelStream

Luke Hutchison luke.hutch at gmail.com
Fri Mar 6 06:11:58 EST 2020

On Fri, Mar 6, 2020 at 3:51 AM Aleksey Shipilev <shade at redhat.com> wrote:

> On 3/6/20 11:40 AM, Luke Hutchison via Concurrency-interest wrote:
> > Thanks. That's pretty interesting, but I can't think of an optimization
> that would have that effect.
> > Can you give an example?
> Method gets inlined, and boom: optimizer does not even see the method
> boundary.

...which is why I specifically excluded inlining in my original question
(or said consider the state after all inlining has taken place). I realize
that inlining doesn't just happen at compiletime, and the JIT could decide
at any point to inline a function, but I want to ignore that (very real)
possibility to understand whether reordering can take place across method
boundaries _if inlining never happens_. Brian Goetz commented that "JIT
regularly makes optimizations that have the effect of reordering operations
across method boundaries" -- so I think the answer is yes. I just don't
understand how that would happen.

> There's no "element-wise volatile" array unless you resort to using an
> AtomicReferenceArray,
> > which creates a wrapper object per array element, which is wasteful on
> computation and space.
> Not really related to this question, but: VarHandles provide "use-site"
> volatility without
> "def-site" volatility. In other words, you can access any non-volatile
> element as if it is volatile.

Thanks for the pointer, although if you need to create one VarHandle per
array element to guarantee this behavior, then that's logically no
different than wrapping each array element in a wrapper object with

(Maybe Java could provide something like a "volatile volatile" type that
could be used with array-typed fields to make "volatile" apply to elements
of an array-typed field, not just to the field itself?)

> I have to assume this is not the case, because the worker threads should
> all go quiescent at the end
> > of the stream, so should have flushed their values out to at least L1
> cache, and the CPU should
> > ensure cache coherency between all cores beyond that point. But I want
> to make sure that can be
> > guaranteed.
> Stop thinking in low level? That would only confuse you.
> Before trying to wrap your head around Streams, consider the plain thread
> pool:
>     ExecutorService e = Executors.newFixedThreadPool(1);
>     int[] a = new int[1];
>     Future<?> f = e.submit(() -> a[0]++);
>     f.get();
>     System.out.println(a[0]); // guaranteed to print "1".
> This happens because all actions in the worker thread (so all writes in
> lambda body) happen-before
> all actions after result acquisition (so all reads after Future.get).
> Parallel streams carry the
> similar property.

Good example, and I guess the "guaranteed" here answers my question.

I guess fundamentally I was asking if any memory reordering (or cache
staleness) can happen across synchronization barriers. It sounds like that
is not the case, due to synchronization barriers implementing a
computational "happens-before" guarantee, which enforces the same
"happens-before" total ordering on memory operations across the barrier.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20200306/dd6757e8/attachment.htm>

More information about the Concurrency-interest mailing list