[concurrency-interest] Enforcing total sync order on modern hardware

Vitaly Davidovich vitalyd at gmail.com
Mon Mar 16 14:35:55 EDT 2015


Marko,

Can't scheduling alone cause a situation where reader reads t=9ms, looks at
the shared value, and sees something old that hasn't been updated yet since
the writer hasn't observed the t=9ms yet (due to scheduling)? Nehalem is a
TSO (total store order) architecture, which means each core's writes appear
in the same order to all other cores, but there's no global order across
all cores.  So your clock-updating thread's stores will appear to the
reader/writer in the same order, but what can reader/writer say about the
other if each reads t = X? I think nothing of consequence given they're not
synchronizing-with the clock-updating thread, but simply observing that
value in a "racy" (with respect to each other) manner.

I'm probably misunderstanding your question/point though, so please correct
me.

On Mon, Mar 16, 2015 at 1:00 PM, Marko Topolnik <marko at hazelcast.com> wrote:

> In an old post I drew the diagram below, which showed happens-before edges
> between three threads. The thread shown at the bottom was a
> "clock-updating" thread, continuously updating a volatile "current time"
> value. The other two threads read the time and were additionally coupled
> through a shared volatile variable with one writer and one reader thread.
>
> My point was that the threads could behave in a paradoxical way from the
> standpoint of "global time": the reader could observe a late time value and
> nevertheless not be prevented from observing an early write to the shared
> var.
>
> The JMM actually prevents this behavior with the enforcement of a total
> sync order.
>
> However, during a private discussion with Martin Thompson, it seemed
> unclear how exactly a runtime would actually enforce total sync order
> without hurting performance. Given that, since Nehalem, cores communicate
> point-to-point over QPI and don't lock the global front-side bus, the CPU
> doesn't naturally offer a total ordering of all lock operations. I would be
> very interested to learn how exactly this goes on, or perhaps whether
> Martin and I were missing something and this is actually not an issue.
>
> Here is the diagram reposted:
>
>
>                                 /--> Rrt6 --/-> Rrt9 --> Rv0
>     ---------------------------+------------+--------/
>   /                            |     ------/
> Wv0 ---> Rwt3 -> Wv1 --> Rwt6  |    /
>         /                /   --|   /
>        |                | /       /
>        T3 ------------> T6 ----> T9
>
>
> T3, T6, T9 -- writes to currentTime
> Rwt0, Rwt1 -- reads of currentTime by writing thread
> Rrt1, Rrt2 -- reads of currentTime by reading thread
> Wv0, Wv1   -- writes to the sharedVar
> Rv0        -- read of the sharedVar
>
> initially t = 0 ms;
> T3 writes t = 3 ms;
> T6 writes t = 6 ms;
> T9 writes t = 9 ms.
>
> The program outputs
>
> Writing 0 at t = 0 ms
> Writing 1 at t = 3 ms
> Reading 0 at t = 9 ms
>
>
> ---
> Marko Topolnik
> Hazelcast
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20150316/a73ccf3b/attachment.html>


More information about the Concurrency-interest mailing list