[concurrency-interest] tsc register

David Dice david.dice at gmail.com
Tue Jan 10 13:14:32 EST 2012

The basic rule for nanoTime() is that the JVM must provide globally
consistent non-retrograde (monotonically non-decreasing) time.   If a
thread calls nanoTime() twice, the 2nd return value must be >= the 1st.
And time needs to be causal in the sense that if some thread A calls
nanoTime() and stores the value in a volatile shared variable, and some
other thread B observes that stored value -- an HBO edge -- and then itself
calls nanoTime(), then the value returned by B's call should be >= the
value B observed in memory.

Using SPARC/Solaris as an example, we implemented nanoTime() in HotSpot via
calls to gethrtime(), which is documented to return monotonically
non-decreasing values.   Gethrtime is implemented as a fast-trap into the
kernel, and computes the return value via the STICK or TICK registers,
which are vaguely similar to the TSC.  The source is cycle-accurate and the
trap costs about 60 cycles.   Unfortunately gethrtime() admits retrograde
time in practice in the 2-thread case above -- I won't go into the reasons
but they're related to clock domains in HW.   The maximum skew between
processors was constrained, however, so to implement nanoTime() we needed
to track the maximum observed gethrtime() value and return the maximum of
that value and the value returned from nanoTime().   Unfortunately that
means we're updating the maximum value frequently -- based on the
nanoTime() call rate -- which then gives us a coherence hotspot that
impedes scaling on big systems.    (One way to reduce the update rate is to
mask off a few low order bits returned from gethrtime(), trading off some
quantization and effective resolution for a reduced update rate when we
have frequent calls to nanoTime()).  The key point is that if the system
provides poor clock sources then the JVM must compensate accordingly.

As others have noted, tsc is messy because of skew, core clock rate
differences, thermal capping, onlining and offlining of processors, etc.
But it's often OK for timing code paths & single-threaded performance
analysis if you're willing to tolerate the issues.

Over the years lots of thought has gone into using tsc/tscp in user-space
in the JVM to implement nanoTime().   Generally, I don' think it's a viable
path unless the kernel is willing and able to provide some additional
constraints on maximum skew or if the kernel could publish per-processor
tsc deltas and you're willing to make the JVM more intimate with that
particular kernel.



p.s., if you must really want to use rdtsc, then rdtscp is preferred if

p.s., as a throw-away experiment I wrote a kernel driver that created a
kernel page that could be mmap()ed RO into user-space.  The page held
lbolt, ticking at 100hz, and a "delta" value that expressed the difference
between lbolt and absolute time.   The values were protected by a seqlock
so that readers could ensure the values were mutually coherent.  Thus, you
could use this page as a replacement for gethrtime (relative time) and
gettimeofday (absolute time).  It didn't quite have the resolution of
gethrtime(), but it was potentially sufficient for the needs of the JVM.
(Recall that nanoTime is already 10 msecs granular on various HotSpot
reference platforms)  The problem is that the kernel needed to update the
time on the clock tick, which didn't work well with the new tickless
kernels -- they help the system get to lower power states.   The trick to
get the best of both worlds was to withdraw permissions on the page when
the page hadn't been accessed recently.   If you subsequently accessed the
page we'd trap, make the page RW, update the time, and restart the periodic
tick timer for a few seconds.   The time is guaranteed monotonic, so
there's no need to update that hot "max time" variable.   Perhaps even
better, it's possible for the JIT to directly inline fetches to the magic
page to query time via a DirectBuffer, avoiding control flow back into
native code,    If you don't have this type of kernel support you can
instead have each JVM create a thread that updates a process-private page
every 10 msecs, and quiesce the thread if there have been no recent time
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120110/5f4218f5/attachment.html>

More information about the Concurrency-interest mailing list