[concurrency-interest] AtomicInteger.getAndIncrement()

bank kus kus.bank at gmail.com
Mon Dec 14 06:32:05 EST 2009


>> But the wait for the cache line isn't bounded,

Isnt that dangerous since it means lock:xxx aside even writes from
certain virtual cores could starve indefinitely? But then I m not sure
we can poll AMD/Intel for more details on their forums.

banks

On Mon, Dec 14, 2009 at 3:21 AM, Andrew Haley <aph at redhat.com> wrote:
> bank kus wrote:
>> Hmm not sure I agree with that. From the hw designers perspective
>> treat each virtual core(that includes hyperthreading/CMT) as a client
>> of the lock make the instruction blocking (no ctxsw'ing until it
>> completes) now you have a bounded set of contenders in the worst case.
>
> That's right: the virtual core won't context switch, but the physical one
> will.  So, it all comes down to precisely what you mean by "wait-free".
> There is one important difference, which is that once you have obtained
> the cache line the atomic increment will complete.  But the wait for the
> cache line isn't bounded, AFAIAA.  Perhaps in practice the chipsets in
> use use round robin, but I'm not sure that anything in the architecture
> specification requires that, any more than the Java memory model does.
>
>> Now round robin over the contenders in any order and you should have a
>> starvation free algorithmor atleast thats what I d think w/o dwelving
>> too deep into this problem. Now whether x86 hw really does this or the
>> fact that they take locks out of their caches makes the arbitration
>> any worse is another question.
>
> Andrew.
>
>
>> On Mon, Dec 14, 2009 at 2:20 AM, Andrew Haley <aph at redhat.com> wrote:
>>> bank kus wrote:
>>>> It is understood the algorithm is not wait-free if it does "loop / CAS
>>>> / add"  the question is more
>>>> <a> if the hw version of lock:fetchAndAdd is wait-free
>>>> <b> if yes does such an instruction exist on all known platforms
>>>> <c> does the AtomicInteger.getandIncrement make any fairness guarantees.
>>>>
>>>> Based on the responses I have received so far from different places
>>>> <a> x86 possibly yes
>>>> <b> NO!!
>>>> <c> Definitely NOT!!
>>> I'm not sure that <a> is really true, even though x86 has an atomic add
>>> instruction.  (Which Java doesn't use, but that's not really my point.)
>>> On a core with hyperthreading, as I understand it, when one thread is blocked
>>> waiting for a cache line the other [thread] will proceed.  So even an atomic add
>>> instruction is not wait free.
>>>
>>> Andrew.
>>>
>>>
>>>> On Sun, Dec 13, 2009 at 4:19 PM, David Holmes <davidcholmes at aapt.net.au> wrote:
>>>>>> bank kus writes
>>>>>> Is this guaranteed to complete in a finite number of steps. Asking
>>>>>> because am evaluating if I can implement ticket locking using this.
>>>>> You can not analyse a CAS use in isolation. You have to see the complete
>>>>> code path that uses the CAS ie any looping   behaviour involving the code
>>>>> that employs the CAS. You also can't analyze it without knowing how many
>>>>> cores/processors are available and how many threads will execute code that
>>>>> performs the CAS on the same location. You also need to know when preemption
>>>>> may occur.
>>>>>
>>>>> Simple answer is that there are no absolute guarantees. But you also don't
>>>>> generally require such a guarantee.
>>>>>
>>>>>> On a 16 virtual core system if all 16 vcores simultaneously had
>>>>>> threads invoke this operation ..
>>>>>> <a> are all vcores guaranteed to block context switching to any other
>>>>>> threads on their runqueues until this operation completes
>>>>>>
>>>>>> <b> is there fair arbitration between cores. The pathological case
>>>>>> being vcore 2 gets to increment  then gets to do it again while other
>>>>>> vcores wait trying to do so.
>>>>> I can't answer this but I don't think its the right question anyway. A CAS
>>>>> is typically used in a loop:
>>>>>
>>>>> while (true) {
>>>>>  state i = read_initial_state();
>>>>>  state n = compute_next_state(i);
>>>>>  if (CAS(state, i, n))
>>>>>     break;
>>>>> }
>>>>>
>>>>> So the semantics of the CAS itself if executed concurrently is not really
>>>>> the issue. If all 16 cores execute a CAS on the same location, one will
>>>>> succeed and 15 will fail. Your question is really: if 16 threads on 16 cores
>>>>> execute the loop above is there any bound on the number of loop iterations
>>>>> each thread may need to perform?
>>>>>
>>>>> And that isn't a question that can be answered without doing a detailed
>>>>> analysis of the system. A simple analysis says the worst case would be 16
>>>>> iterations: 15 failures from interference on other cores plus success. But
>>>>> what if during the 15 failures the first interfering thread can loop through
>>>>> the higher-level code that performs the above CAS-loop such that it retries
>>>>> the CAS-loop again? You can imagine pathological harmonic relationships that
>>>>> could cause one thread to always fail in its CAS-loop. Is it likely? No. Can
>>>>> it happen? Depends on the context. What's the worst case? Depends on the
>>>>> context.
>>>>>
>>>>> This is one reason why spin-locks often degrade to blocking locks after a
>>>>> certain failure threshold is reached.
>>>>>
>>>>> David Holmes
>>>>>
>>>>>
>>>>>> I understand this would depend upon the hw instructions used but am
>>>>>> curious if any fair expectation can be made out of this API.
>>>>>>
>>>>>> Regards
>>>>>> banks
>>>>>> _______________________________________________
>>>>>> Concurrency-interest mailing list
>>>>>> Concurrency-interest at cs.oswego.edu
>>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>> _______________________________________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest at cs.oswego.edu
>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>
>



More information about the Concurrency-interest mailing list