[concurrency-interest] AtomicReference.updateAndGet() mandatory updating

Gregg Wonderly gergg at cox.net
Tue May 30 12:31:53 EDT 2017

> On May 29, 2017, at 3:04 AM, Andrew Haley <aph at redhat.com> wrote:
> On 28/05/17 02:27, Gregg Wonderly wrote:
>>> On May 26, 2017, at 10:05 AM, Andrew Haley <aph at redhat.com> wrote:
>>> On 26/05/17 14:56, Doug Lea wrote:
>>>> On 05/26/2017 09:35 AM, Andrew Haley wrote:
>>>>> On 26/05/17 13:56, Andrew Dinn wrote:
>>>>>>> Initially (in Java5) requiring it has led to some questionable reliance.
>>>>>>> So we cannot change it. But there's not much motivation to do so anyway:
>>>>>>> As implied by Nathan Reynolds, encountering some (local) fence overhead
>>>>>>> on CAS failure typically reduces contention and may improve throughput.
>>>>>> It would be useful to know if that reduction in contention is specific
>>>>>> to, say, x86 hardware or also occurs on weak memory architectures like
>>>>>> AArch64 or ppc. Perhaps Nathan could clarify that?
>>>> The main issues are not tightly bound to architecture.
>>>> In the vast majority of cases, the response to CAS failure is
>>>> some sort of retry (although perhaps with some intermediate
>>>> processing). The fence here plays a similar role to
>>>> Thread.onSpinWait. And in fact, on ARM, is likely to be
>>>> exactly the same implementation as onSpinWait.
>>> onSpinWait is null, and unless ARM does something to the architecture
>>> that's probably what it'll remain.
>>>> As Alex mentioned, in the uncommon cases where this
>>>> is a performance issue, people can use one of the weak CAS
>>>> variants.
>>>>> Just thinking about AArch64, and how to implement such a thing as well
>>>>> as possible. 
>>>> "As well as possible" may be just to unconditionally issue fence,
>>>> at least for plain CAS; maybe differently for the variants.
>>> I doubt that: I've done some measurements, and it always pays to branch
>>> conditionally around a fence if it's not needed.
>> Since the fence is part of the happens before controls that
>> developers encounter, how can a library routine know what the
>> developer needs, to know how to “randomly” optimize with a branch
>> around the fence?  Are you aware of no software that exists where
>> developers are actively counting MM interactions trying to minimize
>> them?  Here you are trying to do it yourself because you “See” an
>> optimization that is so localized, away from any explicit code
>> intent, that you can’t tell ahead of time (during development of
>> your optimization), what other developers have actually done around
>> the fact that this fence was unconditional before right?
>> Help me understand how you know that no software that works
>> correctly now, will start working randomly, incorrectly, because
>> sometimes the fence never happens.
> It's in the specification.  If a fence is required by the
> specification, we must execute one. If not, the question is whether
> it's faster to execute a fence unconditionally or to branch around it.

But that’s not my point.  My point is that once there is a fence, and since now developers are having to program according to “fences” explicit or implicit in the API implementation, you are going to find developers counting and demanding specific fences to be in specific places, because they create happens before events which are precisely what developers must manage.   And, just like you are adamant that performance can be improved by not always providing this fence, developers and engineers are trying to do exactly the same thing by looking at the complete picture of their application (which you have no view into from the point of this optimization).   They are saying to themselves, hey, theirs a write fence in this API, so if we use that to, for example assign a value via CAS, as a work item counter, then we don’t have to worry about all the other state before that, it will be visible.

As soon as you take out the fence, they now have to put in synchronization themselves, and if the fence is always dropped by their code and mostly by the CAS implementation, there are too many write fences, and by spec, they can’t eliminate one of them, they always have to have two.

It’s this kind of optimization where behaviors are unpredictable for essential programming paradigms that make it impossible to find solutions that are optimal, without completely writing everything yourself with unsafe and/or JNI.  Everything about happens-before rides the front side of the wave for https://en.wikipedia.org/wiki/Principle_of_least_astonishment <https://en.wikipedia.org/wiki/Principle_of_least_astonishment>.  Many developers who aren’t working in a single threaded platform environment, are constantly having to manage this issue.  AWT/Swing and background work tasks are plagued with making sure that the AWT thread(s) environment is shared with worker threads, in both directions.  In this environment, volatile everything is the primary solution to keep from having every assignment turn into a whole block of code.

If happens-before is really a necessary part of “Java the Language”, then JDK libraries need to stop making it a hidden feature with random behaviors around edges that the developer is not in explicit control of.   It has become a bigger and bigger rash that is swollen and ugly because there is not a complete solution across the language implementation that provides the explicit view (like checked exceptions do for unrecoverable errors) into the required behavior.   Instead, we have subsets of the runtime which provide many details of control which are seamless.  

ConcurrentHashMap makes data sharing work reliably with no worries, because the APIs do the right thing.

I can see that you’d feel that CAS shouldn’t drop the fence if it didn’t write the value.  However, because that is a costly event that you are trying to remove, some developers may be using CAS as a fence for themselves because it was happening every time already.  I mean, they may be using CAS in a single writer, multiple reader environment where the fence would always drop with or without your change.  But, think of what happens when they add a second writer.  Now, the fence would not always drop because of the race, and thus other visibility issues may arise that would make it very difficult for someone unfamiliar with the “use the fence” optimization to figure out what is going on. Regardless of what the ‘spec’ says, it’s what the implementation does, which become the legacy that has to be supported for the future.  

Would you be happy to break their code with an optimization that is awesome for super racy code, but can break code that is not super racy and ends up with a 1 in 10000 event failure mode that no-one can figure out?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20170530/75f743e1/attachment.html>

More information about the Concurrency-interest mailing list