[concurrency-interest] AtomicInteger.getAndIncrement()

Andrew Haley aph at redhat.com
Mon Dec 14 06:21:04 EST 2009

bank kus wrote:
> Hmm not sure I agree with that. From the hw designers perspective
> treat each virtual core(that includes hyperthreading/CMT) as a client
> of the lock make the instruction blocking (no ctxsw'ing until it
> completes) now you have a bounded set of contenders in the worst case.

That's right: the virtual core won't context switch, but the physical one
will.  So, it all comes down to precisely what you mean by "wait-free".
There is one important difference, which is that once you have obtained
the cache line the atomic increment will complete.  But the wait for the
cache line isn't bounded, AFAIAA.  Perhaps in practice the chipsets in
use use round robin, but I'm not sure that anything in the architecture
specification requires that, any more than the Java memory model does.

> Now round robin over the contenders in any order and you should have a
> starvation free algorithmor atleast thats what I d think w/o dwelving
> too deep into this problem. Now whether x86 hw really does this or the
> fact that they take locks out of their caches makes the arbitration
> any worse is another question.


> On Mon, Dec 14, 2009 at 2:20 AM, Andrew Haley <aph at redhat.com> wrote:
>> bank kus wrote:
>>> It is understood the algorithm is not wait-free if it does "loop / CAS
>>> / add"  the question is more
>>> <a> if the hw version of lock:fetchAndAdd is wait-free
>>> <b> if yes does such an instruction exist on all known platforms
>>> <c> does the AtomicInteger.getandIncrement make any fairness guarantees.
>>> Based on the responses I have received so far from different places
>>> <a> x86 possibly yes
>>> <b> NO!!
>>> <c> Definitely NOT!!
>> I'm not sure that <a> is really true, even though x86 has an atomic add
>> instruction.  (Which Java doesn't use, but that's not really my point.)
>> On a core with hyperthreading, as I understand it, when one thread is blocked
>> waiting for a cache line the other [thread] will proceed.  So even an atomic add
>> instruction is not wait free.
>> Andrew.
>>> On Sun, Dec 13, 2009 at 4:19 PM, David Holmes <davidcholmes at aapt.net.au> wrote:
>>>>> bank kus writes
>>>>> Is this guaranteed to complete in a finite number of steps. Asking
>>>>> because am evaluating if I can implement ticket locking using this.
>>>> You can not analyse a CAS use in isolation. You have to see the complete
>>>> code path that uses the CAS ie any looping   behaviour involving the code
>>>> that employs the CAS. You also can't analyze it without knowing how many
>>>> cores/processors are available and how many threads will execute code that
>>>> performs the CAS on the same location. You also need to know when preemption
>>>> may occur.
>>>> Simple answer is that there are no absolute guarantees. But you also don't
>>>> generally require such a guarantee.
>>>>> On a 16 virtual core system if all 16 vcores simultaneously had
>>>>> threads invoke this operation ..
>>>>> <a> are all vcores guaranteed to block context switching to any other
>>>>> threads on their runqueues until this operation completes
>>>>> <b> is there fair arbitration between cores. The pathological case
>>>>> being vcore 2 gets to increment  then gets to do it again while other
>>>>> vcores wait trying to do so.
>>>> I can't answer this but I don't think its the right question anyway. A CAS
>>>> is typically used in a loop:
>>>> while (true) {
>>>>  state i = read_initial_state();
>>>>  state n = compute_next_state(i);
>>>>  if (CAS(state, i, n))
>>>>     break;
>>>> }
>>>> So the semantics of the CAS itself if executed concurrently is not really
>>>> the issue. If all 16 cores execute a CAS on the same location, one will
>>>> succeed and 15 will fail. Your question is really: if 16 threads on 16 cores
>>>> execute the loop above is there any bound on the number of loop iterations
>>>> each thread may need to perform?
>>>> And that isn't a question that can be answered without doing a detailed
>>>> analysis of the system. A simple analysis says the worst case would be 16
>>>> iterations: 15 failures from interference on other cores plus success. But
>>>> what if during the 15 failures the first interfering thread can loop through
>>>> the higher-level code that performs the above CAS-loop such that it retries
>>>> the CAS-loop again? You can imagine pathological harmonic relationships that
>>>> could cause one thread to always fail in its CAS-loop. Is it likely? No. Can
>>>> it happen? Depends on the context. What's the worst case? Depends on the
>>>> context.
>>>> This is one reason why spin-locks often degrade to blocking locks after a
>>>> certain failure threshold is reached.
>>>> David Holmes
>>>>> I understand this would depend upon the hw instructions used but am
>>>>> curious if any fair expectation can be made out of this API.
>>>>> Regards
>>>>> banks
>>>>> _______________________________________________
>>>>> Concurrency-interest mailing list
>>>>> Concurrency-interest at cs.oswego.edu
>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

More information about the Concurrency-interest mailing list