[concurrency-interest] Enforcing total sync order on modern hardware

Marko Topolnik marko at hazelcast.com
Mon Mar 16 16:22:22 EDT 2015

I was wondering how a JVM implementation would be able to enforce global
sync order across all CPUs in a performant way. So, given the total
ordering on lock instructions, I would assume that the implementation of
any given synchronizing action would have to involve a lock instruction at
some point.

On Mon, Mar 16, 2015 at 9:02 PM, Vitaly Davidovich <vitalyd at gmail.com>

> By "total sync order at the CPU level" do you mean sync order of that cpu
> itself or some global order across all CPUs? The total order of lock
> instructions is across all CPUs, whereas mfence, AFAIK, orders only the
> local CPU's memory operations.  Sorry, maybe I'm being dense today, but I
> still don't get why knowing that lock instructions have total order somehow
> answers your question.  Were you simply asking whether there are
> instructions available to ensure memory operations are done (or appear to
> be) in program (minus compiler code motion) order on a per-cpu basis?
> On Mon, Mar 16, 2015 at 3:46 PM, Marko Topolnik <marko at hazelcast.com>
> wrote:
>> What is important is that there be _some_ way of guaranteeing total sync
>> order at the CPU level. It is less important whether this is achieved by
>> mfence or lock instruction.
>> -Marko
>> On Mon, Mar 16, 2015 at 8:40 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>> Why were you concerned with lock instructions specifically? At one point
>>> in the past, volatile writes were done using mfence, IIRC.
>>> sent from my phone
>>> On Mar 16, 2015 3:28 PM, "Marko Topolnik" <marko at hazelcast.com> wrote:
>>>> Andrew,
>>>> thank you for the reference, this answers the dilemma in full. I didn't
>>>> know this guarantee existed on x86.
>>>> ---
>>>> Marko
>>>> On Mon, Mar 16, 2015 at 7:44 PM, Andrew Haley <aph at redhat.com> wrote:
>>>>> On 03/16/2015 05:00 PM, Marko Topolnik wrote:
>>>>> > Given that, since Nehalem, cores communicate point-to-point over QPI
>>>>> > and don't lock the global front-side bus, the CPU doesn't naturally
>>>>> > offer a total ordering of all lock operations.
>>>>> Intel do actually guarantee
>>>>>     Locked instructions have a total order.
>>>>> so this is a hardware problem, not a software one.  How exactly the
>>>>> hardware people do this on a large network of processors is some of
>>>>> the most Secret Sauce, but I can imagine some kind of combining
>>>>> network in hardware.
>>>>> Andrew.
>>>>> [1]  Intel® 64 and IA-32 Architectures Software Developer’s Manual
>>>>> Volume 3 (3A, 3B & 3C): System Programming Guide 8.2.2, Memory
>>>>> Ordering in P6 and More Recent Processor Families
>>>> _______________________________________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest at cs.oswego.edu
>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20150316/f39a9349/attachment-0001.html>

More information about the Concurrency-interest mailing list