[concurrency-interest] Enforcing total sync order on modern hardware
vitalyd at gmail.com
Mon Mar 16 14:35:55 EDT 2015
Can't scheduling alone cause a situation where reader reads t=9ms, looks at
the shared value, and sees something old that hasn't been updated yet since
the writer hasn't observed the t=9ms yet (due to scheduling)? Nehalem is a
TSO (total store order) architecture, which means each core's writes appear
in the same order to all other cores, but there's no global order across
all cores. So your clock-updating thread's stores will appear to the
reader/writer in the same order, but what can reader/writer say about the
other if each reads t = X? I think nothing of consequence given they're not
synchronizing-with the clock-updating thread, but simply observing that
value in a "racy" (with respect to each other) manner.
I'm probably misunderstanding your question/point though, so please correct
On Mon, Mar 16, 2015 at 1:00 PM, Marko Topolnik <marko at hazelcast.com> wrote:
> In an old post I drew the diagram below, which showed happens-before edges
> between three threads. The thread shown at the bottom was a
> "clock-updating" thread, continuously updating a volatile "current time"
> value. The other two threads read the time and were additionally coupled
> through a shared volatile variable with one writer and one reader thread.
> My point was that the threads could behave in a paradoxical way from the
> standpoint of "global time": the reader could observe a late time value and
> nevertheless not be prevented from observing an early write to the shared
> The JMM actually prevents this behavior with the enforcement of a total
> sync order.
> However, during a private discussion with Martin Thompson, it seemed
> unclear how exactly a runtime would actually enforce total sync order
> without hurting performance. Given that, since Nehalem, cores communicate
> point-to-point over QPI and don't lock the global front-side bus, the CPU
> doesn't naturally offer a total ordering of all lock operations. I would be
> very interested to learn how exactly this goes on, or perhaps whether
> Martin and I were missing something and this is actually not an issue.
> Here is the diagram reposted:
> /--> Rrt6 --/-> Rrt9 --> Rv0
> / | ------/
> Wv0 ---> Rwt3 -> Wv1 --> Rwt6 | /
> / / --| /
> | | / /
> T3 ------------> T6 ----> T9
> T3, T6, T9 -- writes to currentTime
> Rwt0, Rwt1 -- reads of currentTime by writing thread
> Rrt1, Rrt2 -- reads of currentTime by reading thread
> Wv0, Wv1 -- writes to the sharedVar
> Rv0 -- read of the sharedVar
> initially t = 0 ms;
> T3 writes t = 3 ms;
> T6 writes t = 6 ms;
> T9 writes t = 9 ms.
> The program outputs
> Writing 0 at t = 0 ms
> Writing 1 at t = 3 ms
> Reading 0 at t = 9 ms
> Marko Topolnik
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest