[concurrency-interest] ThreadLocal vs ProcessorLocal

Kirk Pepperdine kirk at kodewerk.com
Thu Oct 18 02:57:03 EDT 2012


The problem with discarding outliers is that you don't really know if it's a clock problem or that something else is going on.

Regards,
Kirk

On 2012-10-18, at 4:46 AM, Jacy Odin Grannis <jacyg at alumni.rice.edu> wrote:

> So, when I'd seen this it was on Linux 2.6.9, Opteron 8384s, Java
> 1.6.0_16 (it was a couple years back).
> 
> Based on this:
> http://juliusdavies.ca/posix_clocks/clock_realtime_linux_faq.html  it
> would look like it could be that the kernel was buggy, as only 2.6.18
> and up are solid.
> 
> However, I also found this link:
> http://efreedom.com/Question/1-6814792/Clock-Gettime-Erratic  which
> notes erratic behavior even on 2.6.26, when using an Opteron.
> Somewhere in this thread
> http://stackoverflow.com/questions/510462/is-system-nanotime-completely-useless
> someone comments that when they used AMD before that there wasn't any
> synchronization even across cores on the same die.  So, I'm going to
> guess AMD was the source of the behavior I saw.  The values were
> definitely very different, they would frequently be separated by
> hundreds of thousands of "nanos" from one core to the next.  Makes for
> funny outliers in your data set when you're trying to capture timings.
> Although my big concern wasn't the huge swings--those you can easily
> discard--it was worry that I'd end up with noise I couldn't account
> for if the timings were close but not quite in sync.
> 
> At any rate, hopefully that's helpful to someone if they're trying to
> debug similarly strange results on an older machine.
> 
> Jacy
> 
> On Wed, Oct 17, 2012 at 9:31 PM, David Holmes <davidcholmes at aapt.net.au> wrote:
>> Well the nanoTime() behaviour is not a JVM bug, though the JVM can try to
>> account for the underlying buggy OS and/or configuration and/or hardware.
>> ;-)
>> 
>> As Dave states on LInux we will use CLOCK_MONOTONIC if available (which is
>> pretty much always these days), else we fall back to gettimeofday. Now as
>> you can infer from the name CLOCK_MONOTONIC is supposed to be monotonic and
>> if it isn't that is a bug in the OS or a system configuration error (using
>> an unreliable clocksource such as the TSC on MP systems). In contrast we
>> make no pretense that gettimeofday is expected to be monotonic.
>> 
>> Also as Dave states we don't try to guard against a buggy CLOCK_MONOTONIC on
>> linux by ensuring it never reports a value less than any previous value
>> reported. We could, and probably should, but it is one of many things on a
>> long list.
>> 
>> But if you see big problems with nanoTime then either your system is using
>> the TSC as a clocksource when it should not, OR you are running in a virtual
>> environment and the host system is not providing a stable time source to the
>> guest OS.
>> 
>> Note: for the TSC to be usable it must be both stable (frequency invariant)
>> and synchronized across all "processors". While many processors now provide
>> a stable TSC they don't provide a synchronized TSC. For the OS to be able to
>> use the TSC as a monotonic clocksource it needs to do its own very accurate
>> synchronization. Solaris actually attempts this, where most operating
>> systems simply stopped using the TSC, but because of that there has been a
>> very long bug-tail on Solaris.
>> 
>> David Holmes
>> 
>> -----Original Message-----
>> From: concurrency-interest-bounces at cs.oswego.edu
>> [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of David Dice
>> Sent: Thursday, 18 October 2012 12:11 PM
>> To: concurrency-interest at cs.oswego.edu
>> Subject: Re: [concurrency-interest] ThreadLocal vs ProcessorLocal
>> 
>> 
>> 
>>> Date: Wed, 17 Oct 2012 16:55:50 -0500
>>> From: Jacy Odin Grannis <jacyg at alumni.rice.edu>
>>> To: "Dr Heinz M. Kabutz" <heinz at javaspecialists.eu>
>>> Cc: concurrency-interest at cs.oswego.edu, David Dice
>>>        <david.dice at gmail.com>
>>> Subject: Re: [concurrency-interest] ThreadLocal vs ProcessorLocal
>>> Message-ID:
>>> 
>>> <CAESiqEqAteCAsCDLWXM-3-89bJS2nBRrLtOBmHrJkkcX2=Sh4g at mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>> 
>>> Yes, definitely.  I've seen this happen.  One easy way you can see
>>> this is System.nanoTime will suddenly start returning wildly different
>>> values.  nanoTime is only consistent on a single processor, it can
>>> vary widely between processors (at least on Linux).
>>> 
>>> I think what's really needed is a set of language level constructs for
>>> really addressing the problem.  I know there are experimental projects
>>> looking to do that (
>>> 
>>> https://wiki.rice.edu/confluence/display/HABANERO/Habanero+Multicore+Software+Research+Project
>>> ).  I am not sure to what extent it would be possible to build support
>>> for the various constructs in the JVM; and then aside from that, how
>>> you would add language support is another matter.
>> 
>> 
>> The nanoTime() behavior sounds like a JVM bug.   nanoTime() values should be
>> non-retrograde and causal in the sense that if one thread calls nanoTime and
>> stores the observed value T into a variable, and then some 2nd thread reads
>> that variable and observes the store of T and then calls nanoTime and sees
>> value U, we should have U >= T.  (Volatiles are assumed, obviously).     I
>> first ran into this non-monotonic time problem on large SPARC systems where
>> the HW clock underlying the native gethrtime() API exhibited drift between
>> CPUs.   The drift was minimal as the kernel syncs the clocks periodically,
>> so we tracked the the maximum value returned by nanoTime() and would return
>> the maximum of that tracking value and the value we got via gethrtime().
>> This works, but creates its own cache coherence hot-spot as we're updating
>> that variable frequently, which means that concurrent and unrelated
>> nanoTime() calls don't scale as well as we might like.   (There are ways to
>> avoid the coherence hot spot but they usually entail reduced accuracy).
>> 
>> It's been years since I've looked at the code, but I think we use
>> CLOCK_MONOTONIC if it's available on linux.   (David Holmes could best
>> answer this part of the question regarding linux time sources).  But the
>> guards against returning a smaller value aren't in place in the linux
>> platform-specific code as they are on Solaris.
>> 
>> Regards
>> Dave
>> 
>> 
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>> 
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest




More information about the Concurrency-interest mailing list