[concurrency-interest] Enforcing total sync order on modern hardware

Marko Topolnik marko at hazelcast.com
Mon Mar 16 16:22:22 EDT 2015


I was wondering how a JVM implementation would be able to enforce global
sync order across all CPUs in a performant way. So, given the total
ordering on lock instructions, I would assume that the implementation of
any given synchronizing action would have to involve a lock instruction at
some point.

On Mon, Mar 16, 2015 at 9:02 PM, Vitaly Davidovich <vitalyd at gmail.com>
wrote:

> By "total sync order at the CPU level" do you mean sync order of that cpu
> itself or some global order across all CPUs? The total order of lock
> instructions is across all CPUs, whereas mfence, AFAIK, orders only the
> local CPU's memory operations.  Sorry, maybe I'm being dense today, but I
> still don't get why knowing that lock instructions have total order somehow
> answers your question.  Were you simply asking whether there are
> instructions available to ensure memory operations are done (or appear to
> be) in program (minus compiler code motion) order on a per-cpu basis?
>
> On Mon, Mar 16, 2015 at 3:46 PM, Marko Topolnik <marko at hazelcast.com>
> wrote:
>
>> What is important is that there be _some_ way of guaranteeing total sync
>> order at the CPU level. It is less important whether this is achieved by
>> mfence or lock instruction.
>>
>> -Marko
>>
>> On Mon, Mar 16, 2015 at 8:40 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>
>>> Why were you concerned with lock instructions specifically? At one point
>>> in the past, volatile writes were done using mfence, IIRC.
>>>
>>> sent from my phone
>>> On Mar 16, 2015 3:28 PM, "Marko Topolnik" <marko at hazelcast.com> wrote:
>>>
>>>> Andrew,
>>>>
>>>> thank you for the reference, this answers the dilemma in full. I didn't
>>>> know this guarantee existed on x86.
>>>>
>>>> ---
>>>> Marko
>>>>
>>>> On Mon, Mar 16, 2015 at 7:44 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>
>>>>> On 03/16/2015 05:00 PM, Marko Topolnik wrote:
>>>>>
>>>>> > Given that, since Nehalem, cores communicate point-to-point over QPI
>>>>> > and don't lock the global front-side bus, the CPU doesn't naturally
>>>>> > offer a total ordering of all lock operations.
>>>>>
>>>>> Intel do actually guarantee
>>>>>
>>>>>     Locked instructions have a total order.
>>>>>
>>>>> so this is a hardware problem, not a software one.  How exactly the
>>>>> hardware people do this on a large network of processors is some of
>>>>> the most Secret Sauce, but I can imagine some kind of combining
>>>>> network in hardware.
>>>>>
>>>>> Andrew.
>>>>>
>>>>> [1]  Intel® 64 and IA-32 Architectures Software Developer’s Manual
>>>>> Volume 3 (3A, 3B & 3C): System Programming Guide 8.2.2, Memory
>>>>> Ordering in P6 and More Recent Processor Families
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest at cs.oswego.edu
>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20150316/f39a9349/attachment-0001.html>


More information about the Concurrency-interest mailing list