[concurrency-interest] On A Formal Definition of 'Data-Race'
nathan.reynolds at oracle.com
Tue Apr 16 19:01:02 EDT 2013
On x86, only loads can bypass stores. So, the program can make progress
even though the store hasn't been made globally visible.
In an extreme case, the store is not made globally visible for a very
long time. The load/store buffer is eventually going to be filled with
other stores (loads will be able to complete). The core is going to
stall waiting for the store at the front of the line to complete.
In order for a store to complete, it has to be pushed into the L1 cache
for the core. In order to do this, the cache line has to be fetched
from another core's cache or from RAM. Then the cache line has to be
invalidated in all other cores. Both of these operations can be done in
a single message sent to all of the cores on the system.
Consider that an L3 cache miss takes 14-38 clocks or 6-66 ns
(http://www.sisoftware.net/?d=qa&f=ben_mem_latency) on a Sandy Bridge E
processor. This means a store can take a long time relatively speaking.
Also, consider that the system could have 8 processor sockets. Some
processor sockets are not directly connected and must communicate via a
shared processor socket. This increases the latency of the messaging
Without a memory fence after a non-volatile write, the subsequent loads
can bypass the store. These loads could "read" the value being stored
or "read" values previously stored. This means there is no
happens-before relationship between the stores and loads. In other
words, the loads could happen before the store.
There is no way to know the timing of the visibility of stopped. The
store could happen very quickly (i.e. 4 clocks) if the cache line is in
the modified or exclusive state in the core's L1 cache or it could
happen after the entire system has removed the cache line from all of
the cores and has acknowledge the invalidation.
Architect | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology
On 4/16/2013 3:31 PM, thurstonn wrote:
> Nathan Reynolds-2 wrote
>> All things being equal, reading a volatile and non-volatile field from
>> L1/2/3/4 cache/memory has no impact on performance. The instructions
>> are exactly the same (on x86).
>> Writing a volatile and non-volatile field to cache/memory has an impact
>> on performance. Writing to a volatile field requires a memory fence on
>> x86 and many other processors. This fence is going to take cycles.
>> Nathan Reynolds
>> <http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> |
>> Architect | 602.333.9091
>> Oracle PSR Engineering <http://psr.us.oracle.com/> | Server
> Sure, that's my understanding as well. I wasn't asking about the 'cost' of
> reading #stopped when declared volatile, as you mentioned there isn't one.
> My question was about the 'timing' of the visibility of #stopped in the
> *non-volatile* case, given cache coherency
> View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/On-A-Formal-Definition-of-Data-Race-tp9408p9459.html
> Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Concurrency-interest