[concurrency-interest] AtomicInteger.getAndIncrement()

bank kus kus.bank at gmail.com
Mon Dec 14 06:13:50 EST 2009


Hmm not sure I agree with that. From the hw designers perspective
treat each virtual core(that includes hyperthreading/CMT) as a client
of the lock make the instruction blocking (no ctxsw'ing until it
completes) now you have a bounded set of contenders in the worst case.

Now round robin over the contenders in any order and you should have a
starvation free algorithmor atleast thats what I d think w/o dwelving
too deep into this problem. Now whether x86 hw really does this or the
fact that they take locks out of their caches makes the arbitration
any worse is another question.

banks

On Mon, Dec 14, 2009 at 2:20 AM, Andrew Haley <aph at redhat.com> wrote:
> bank kus wrote:
>> It is understood the algorithm is not wait-free if it does "loop / CAS
>> / add"  the question is more
>> <a> if the hw version of lock:fetchAndAdd is wait-free
>> <b> if yes does such an instruction exist on all known platforms
>> <c> does the AtomicInteger.getandIncrement make any fairness guarantees.
>>
>> Based on the responses I have received so far from different places
>> <a> x86 possibly yes
>> <b> NO!!
>> <c> Definitely NOT!!
>
> I'm not sure that <a> is really true, even though x86 has an atomic add
> instruction.  (Which Java doesn't use, but that's not really my point.)
> On a core with hyperthreading, as I understand it, when one thread is blocked
> waiting for a cache line the other core will proceed.  So even an atomic add
> instruction is not wait free.
>
> Andrew.
>
>
>> On Sun, Dec 13, 2009 at 4:19 PM, David Holmes <davidcholmes at aapt.net.au> wrote:
>>>> bank kus writes
>>>> Is this guaranteed to complete in a finite number of steps. Asking
>>>> because am evaluating if I can implement ticket locking using this.
>>> You can not analyse a CAS use in isolation. You have to see the complete
>>> code path that uses the CAS ie any looping   behaviour involving the code
>>> that employs the CAS. You also can't analyze it without knowing how many
>>> cores/processors are available and how many threads will execute code that
>>> performs the CAS on the same location. You also need to know when preemption
>>> may occur.
>>>
>>> Simple answer is that there are no absolute guarantees. But you also don't
>>> generally require such a guarantee.
>>>
>>>> On a 16 virtual core system if all 16 vcores simultaneously had
>>>> threads invoke this operation ..
>>>> <a> are all vcores guaranteed to block context switching to any other
>>>> threads on their runqueues until this operation completes
>>>>
>>>> <b> is there fair arbitration between cores. The pathological case
>>>> being vcore 2 gets to increment  then gets to do it again while other
>>>> vcores wait trying to do so.
>>> I can't answer this but I don't think its the right question anyway. A CAS
>>> is typically used in a loop:
>>>
>>> while (true) {
>>>  state i = read_initial_state();
>>>  state n = compute_next_state(i);
>>>  if (CAS(state, i, n))
>>>     break;
>>> }
>>>
>>> So the semantics of the CAS itself if executed concurrently is not really
>>> the issue. If all 16 cores execute a CAS on the same location, one will
>>> succeed and 15 will fail. Your question is really: if 16 threads on 16 cores
>>> execute the loop above is there any bound on the number of loop iterations
>>> each thread may need to perform?
>>>
>>> And that isn't a question that can be answered without doing a detailed
>>> analysis of the system. A simple analysis says the worst case would be 16
>>> iterations: 15 failures from interference on other cores plus success. But
>>> what if during the 15 failures the first interfering thread can loop through
>>> the higher-level code that performs the above CAS-loop such that it retries
>>> the CAS-loop again? You can imagine pathological harmonic relationships that
>>> could cause one thread to always fail in its CAS-loop. Is it likely? No. Can
>>> it happen? Depends on the context. What's the worst case? Depends on the
>>> context.
>>>
>>> This is one reason why spin-locks often degrade to blocking locks after a
>>> certain failure threshold is reached.
>>>
>>> David Holmes
>>>
>>>
>>>> I understand this would depend upon the hw instructions used but am
>>>> curious if any fair expectation can be made out of this API.
>>>>
>>>> Regards
>>>> banks
>>>> _______________________________________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest at cs.oswego.edu
>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>



More information about the Concurrency-interest mailing list