[concurrency-interest] Timer notification drifts

David Holmes davidcholmes at aapt.net.au
Tue Jan 18 17:02:31 EST 2011


> Why is that? It seems to work fine.

jiffies is low resolution based on the system tick. Fine for time-of-day,
but may not be suitable if you need higher resolution timing capabilities.

Also beware never to set NO_HZ at the same time (google for it :) ).

But if it works for you then ...

Cheers,
David

> -----Original Message-----
> From: David Holmes [mailto:davidcholmes at aapt.net.au]
> Sent: Tuesday, January 18, 2011 4:44 PM
> To: Navin Jha
> Cc: concurrency-interest at cs.oswego.edu
> Subject: RE: [concurrency-interest] Timer notification drifts
>
> > The machine is using jiffies.
>
> Strewth! That's not what you want! Get it changed to hpet.
>
> David
>
> > -----Original Message-----
> > From: David Holmes [mailto:davidcholmes at aapt.net.au]
> > Sent: Saturday, January 15, 2011 4:50 AM
> > To: Navin Jha
> > Cc: concurrency-interest at cs.oswego.edu
> > Subject: RE: [concurrency-interest] Timer notification drifts
> >
> > Navin Jha writes:
> > > We upgraded one of our box to 5.4 linux as you suggested and
> > that did it!
> > >
> > > We did't even have to change any setting.
> >
> > Out of interest what clocksource is the updated system using?
> >
> > David
> >
> >
> > > Thank you so much!
> > >
> > > Regards,
> > > Navin
> > >
> > >
> > > -----Original Message-----
> > > From: David Holmes [mailto:davidcholmes at aapt.net.au]
> > > Sent: Tuesday, January 11, 2011 6:04 PM
> > > To: Navin Jha
> > > Cc: concurrency-interest at cs.oswego.edu
> > > Subject: RE: [concurrency-interest] Timer notification drifts
> > >
> > > > We don't see these files, is there another way to check this?
> > >
> > > As Mark says your kernel may be too old to expose these.
> > >
> > > > Also this the log that might tell you more:
> > > > [11:16:23][root at numpintapp15:~]$ grep -i tsc /var/log/messages
> > > > Jan 10 16:51:18 numpintapp15 checking TSC synchronization across
> > > > 12 CPUs: passed.
> > >
> > > I'm very skeptical that an old kernel will be able to
> > synchronize the TSC
> > > across 12 cores. Linux abandoned use of the TSC as the primary
> > timesource
> > > for MP systems.
> > >
> > > http://lwn.net/Articles/209101/
> > >
> > > But maybe things have progressed since then. I know Solaris
> > went to great
> > > lengths to do the TSC synchronization, and while it seems
> > reasonable today
> > > there were a few bumps along the road.
> > >
> > > > We've tried to specify the clocksource in the
> > > > /boot/grub/grub.conf file.  We've been trying various
> > > > clocksources. Would you let us know how to set the appropriate
> > > > clocksource? Below is the grub.conf file.
> > >
> > > I'm not a linux expert but it appears that clock=hpet as the
> appropriate
> > > boot option should work. These log entries indicate tsc is used:
> > >
> > > Jan 10 18:44:16 numpintapp15 Bootdata ok (command line is ro
> > root=LABEL=/
> > > hda=ide-scsi hpet=disable pnpacpi=off clock=tsc)
> > > Jan 10 18:44:16 numpintapp15 Kernel command line: ro root=LABEL=/
> > > hda=ide-scsi hpet=disable pnpacpi=off clock=tsc console=tty0
> > >
> > > David
> > > ------
> > >
> > > > [11:17:43][root at numpintapp15:~]$ cat /boot/grub/grub.conf
> > > > # grub.conf generated by anaconda
> > > > #
> > > > # Note that you do not have to rerun grub after making changes to
> > > > this file
> > > > # NOTICE:  You have a /boot partition.  This means that
> > > > #          all kernel and initrd paths are relative to /boot/, eg.
> > > > #          root (hd0,0)
> > > > #          kernel /vmlinuz-version ro root=/dev/cciss/c0d0p6
> > > > #          initrd /initrd-version.img
> > > > #boot=/dev/cciss/c0d0
> > > > default=1
> > > > timeout=5
> > > > splashimage=(hd0,0)/grub/splash.xpm.gz
> > > > hiddenmenu
> > > > title Red Hat Enterprise Linux AS (2.6.9-78.0.8.EL)
> > > >         root (hd0,0)
> > > >         kernel /vmlinuz-2.6.9-78.0.8.EL ro root=LABEL=/
> > > > hda=ide-scsi clock=hpet clocksource=hpet
> > > >         initrd /initrd-2.6.9-78.0.8.EL.img
> > > > title Red Hat Enterprise Linux AS (2.6.9-78.ELlargesmp)
> > > >         root (hd0,0)
> > > >         kernel /vmlinuz-2.6.9-78.ELlargesmp ro root=LABEL=/
> > > > hda=ide-scsi hpet=disable pnpacpi=off clock=tsc
> > > >         initrd /initrd-2.6.9-78.ELlargesmp.img
> > > > title Red Hat Enterprise Linux AS-up (2.6.9-78.EL)
> > > >         root (hd0,0)
> > > >         kernel /vmlinuz-2.6.9-78.EL ro root=LABEL=/ hda=ide-scsi
> > > > clock=hpet clocksource=hpet
> > > >         initrd /initrd-2.6.9-78.EL.img
> > > >
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: David Holmes [mailto:davidcholmes at aapt.net.au]
> > > > Sent: Monday, January 10, 2011 6:12 PM
> > > > To: Navin Jha
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: RE: [concurrency-interest] Timer notification drifts
> > > >
> > > > I believe you need root access to check the clocksource
> > (unless default
> > > > perms are modified):
> > > >
> > > > cat
> /sys/devices/system/clocksource/clocksource0/available_clocksource
> > > >
> > > > to see what's available, and
> > > >
> > > > cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> > > >
> > > > to see what is in use.
> > > >
> > > > I was suggesting using nanoTime to measure the elapsed time,
> > rather than
> > > > date/currentTimeMillis, as nanoTime should not be affected by
> > > time-of-day
> > > > adjustments. Which reminds me: check if ntp is running and if
> > so turn it
> > > > off, if not turn it on, and see if that makes a difference.
> > > >
> > > > David
> > > >
> > > > -----Original Message-----
> > > > From: Navin Jha [mailto:navin.jha at FXALL.com]
> > > > Sent: Tuesday, 11 January 2011 9:04 AM
> > > > To: dholmes at ieee.org
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: RE: [concurrency-interest] Timer notification drifts
> > > >
> > > >
> > > > David,
> > > >
> > > > I tried LockSupport.park(..) and
> ScheduledThreadPoolExecutor with same
> > > > results and they use nano time. Are you suggesting using nano
> > > time in any
> > > > other way? I will try them again after the switch to HPET to see
> > > > if there is
> > > > any difference. I have communicated your concern  about the use
> > > of TSC on
> > > > multi-processor systems to our unix support group.
> > > >
> > > > One more question, how do I make sure that the clock source
> is set up
> > > > correctly to HPET? Since it was done by the support group  I
> > > want to make
> > > > sure that they did it  correctly?
> > > >
> > > > -Navin
> > > >
> > > > From: David Holmes [mailto:davidcholmes at aapt.net.au]
> > > > Sent: Monday, January 10, 2011 5:41 PM
> > > > To: Navin Jha
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: RE: [concurrency-interest] Timer notification drifts
> > > >
> > > > Try using System.nanoTime to track things.
> > > >
> > > > BTW unless you have CPUs with reliable TSC then you should never
> > > > use TSC as
> > > > the clocksource on multi-processor systems. There are some
> OS specific
> > > > utilities from the chip vendors to fix TSC drift but I don't
> > > know what is
> > > > available for Linux. It's been a little while since I checked on
> > > > the current
> > > > state of this.
> > > >
> > > > David
> > > > -----Original Message-----
> > > > From: Navin Jha [mailto:navin.jha at FXALL.com]
> > > > Sent: Tuesday, 11 January 2011 8:33 AM
> > > > To: dholmes at ieee.org
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: RE: [concurrency-interest] Timer notification drifts
> > > > David,
> > > > TSC was used by default (according to unix support folks, since I
> > > > don't have
> > > > access to configs on those machine), they switched it to use HPET
> > > > and I got
> > > > the same result. Below is the sample code I used to test. On
> > > one machine I
> > > > see a constant drift of about 15 milliseconds regardless of how
> > > > of long the
> > > > timer runs (I ran it for 1 hr, 2 hrs and 24hrs). This machine has
> > > > 2 cpus. On
> > > > the machine with 8 cpus The drift was close to 11 seconds
> for 24 hrs.
> > > >
> > > > import java.text.SimpleDateFormat;
> > > > import java.util.Calendar;
> > > > import java.util.Timer;
> > > > import java.util.TimerTask;
> > > >
> > > > public class TimerTest {
> > > >
> > > >          public static void main(String[] args) {
> > > >
> > > >                try {
> > > >                   long schedule=500;
> > > >                   if (args.length > 0)
> > > >                         schedule=Long.parseLong(args[0]);
> > > >                   System.out.println( " scheduling job\n");
> > > >                   Timer eodTimer = new Timer(true);
> > > >                   SimpleDateFormat simpleDateFormat = new
> > > > SimpleDateFormat("HH:mm:ss:S");
> > > >                   System.out.println(" Current time = " +
> > > > simpleDateFormat.format(Calendar.getInstance().getTime()));
> > > >                   eodTimer.scheduleAtFixedRate(new
> > > > TestTask(System.currentTimeMillis(),schedule,
> > > simpleDateFormat), schedule,
> > > > schedule);
> > > >                   Thread.sleep(Long.MAX_VALUE);
> > > >             } catch (Exception e) {
> > > >               System.out.println(e);
> > > >             }
> > > >
> > > >       }
> > > >
> > > >          private static class TestTask extends TimerTask {
> > > >
> > > >           long startTime;
> > > >           long sch;
> > > >           SimpleDateFormat simpleDateFormat;
> > > >           TestTask (long startTime,long schedule, SimpleDateFormat
> > > > simpleDateFormat)
> > > >           {
> > > >             this.startTime=startTime;
> > > >             sch=schedule;
> > > >             this.simpleDateFormat = simpleDateFormat;
> > > >           }
> > > >             public void run() {
> > > >                   long now = System.currentTimeMillis();
> > > >                   System.out.println(" End time     = " +
> > > > simpleDateFormat.format(Calendar.getInstance().getTime()) + "\n");
> > > >                   System.out.println(" Actual diff  =
> > > "+(now-startTime)+ "
> > > > milliseconds, expected diff = " + sch + " milliseconds");
> > > >                   System.exit(1);
> > > >
> > > >             }
> > > >          }
> > > > }
> > > >
> > > > From: David Holmes [mailto:davidcholmes at aapt.net.au]
> > > > Sent: Monday, January 10, 2011 5:21 PM
> > > > To: Navin Jha
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: RE: [concurrency-interest] Timer notification drifts
> > > >
> > > > So I take it that HPET is already being used.
> > > >
> > > > The other possibility here is that it is not the timer that is
> > > > drifting but
> > > > the time source that you are using to measure/track when the
> > > timer fires.
> > > > How are you tracking that?
> > > >
> > > > David
> > > > -----Original Message-----
> > > > From: Navin Jha [mailto:navin.jha at FXALL.com]
> > > > Sent: Tuesday, 11 January 2011 8:11 AM
> > > > To: dholmes at ieee.org
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: RE: [concurrency-interest] Timer notification drifts
> > > > No luck L
> > > >
> > > > Does having too many cpus effect this in anyway?
> > > >
> > > > The machine on which a sample test code works only has 2 cpus
> > while the
> > > > machine on which we see a huge drift has 8 cpus.
> > > >
> > > > From: David Holmes [mailto:davidcholmes at aapt.net.au]
> > > > Sent: Monday, January 10, 2011 3:46 PM
> > > > To: Navin Jha
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: RE: [concurrency-interest] Timer notification drifts
> > > >
> > > > Check what clocksource the problematic system is using. If it
> > > is TSC then
> > > > switch to HPET.
> > > >
> > > > These things are difficult to diagnoze.
> > > >
> > > > David Holmes
> > > > -----Original Message-----
> > > > From: concurrency-interest-bounces at cs.oswego.edu
> > > > [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of
> > > Navin Jha
> > > > Sent: Tuesday, 11 January 2011 4:30 AM
> > > > To: Attila Szegedi
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: Re: [concurrency-interest] Timer notification drifts
> > > > From what I have learned so far is that this is a common
> > problem but the
> > > > drift is too much for us since it effects our trading date
> rollover J
> > > >
> > > > David Holmes has nice blog on clocks
> > > > (http://blogs.sun.com/dholmes/entry/inside_the_hotspot_vm_clocks)
> > > > and in his
> > > > blog he suggested to someone with a similar problem that he
> > > > should post the
> > > > problem here.
> > > >
> > > > From: Attila Szegedi [mailto:szegedia at gmail.com]
> > > > Sent: Monday, January 10, 2011 1:24 PM
> > > > To: Navin Jha
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: Re: [concurrency-interest] Timer notification drifts
> > > >
> > > > I see - you weren't specific about which API call you use, so I
> > > wanted to
> > > > root out the rookie mistake :-) Hm. a drift should definitely not
> > > > occur with
> > > > scheduleAtFixedRate, as far as I can tell. At this point, this
> > > > indeed start
> > > > to sound as a topic relevant for this group :-) Although I
> > > can't you help
> > > > past this stage (not much Linux system expertise), I guess
> > > > whoever wants to
> > > > look into this will want the JRE, Linux, and CPU versions of
> > > your system.
> > > >
> > > > On Jan 10, 2011, at 10:09 AM, Navin Jha wrote:
> > > >
> > > > This is exactly what we use. A sample code we tried works fine of
> > > > some linux
> > > > machines with a constant lag value (say 15 milliseconds) but
> > > > fails on other
> > > > linux machines. We are trying to find if there is something
> > about those
> > > > linux machines that causes this. The machines on which this
> > is happening
> > > > ironically have much better hardware (high end multi-core linux
> > > servers).
> > > >
> > > > From: Attila Szegedi [mailto:szegedia at gmail.com]
> > > > Sent: Monday, January 10, 2011 1:03 PM
> > > > To: Navin Jha
> > > > Cc: concurrency-interest at cs.oswego.edu
> > > > Subject: Re: [concurrency-interest] Timer notification drifts
> > > >
> > > > scheduleAtFixedRate should help:
> > > > <http://download.oracle.com/javase/1.4.2/docs/api/java/util/Timer.
> > > html#sched
> > > uleAtFixedRate(java.util.TimerTask,%20java.util.Date,%20long)>
> > >
> > > On Jan 10, 2011, at 9:32 AM, Navin Jha wrote:
> > >
> > > Hi,
> > >
> > > Not sure if this is the right place to post this problem. We use
> > > java.util.Timer class for a notification that needs to
> happens every 24
> > > hours. We noticed that on some linux multi-core servers the
> notification
> > > occurs almost 11 seconds later. If we run for successive
> > smaller durations
> > > say 1 hour, 2 hours, 3 hours. we notice that the lag does
> > > accumulate. So for
> > > 1 hour it is 600 milliseconds, for 2 hours it is 1.2 seconds etc..
> > >
> > > The only solutions we can think of right now is to run the timer
> > > for smaller
> > > duration and restart it after that duration.
> > >
> > > Is there a solution/workaround for this problem?
> > >
> > > Regards,
> > > Navin
> > >
> > >
> > >
> > >
> >
> >
> >
>
>
>



More information about the Concurrency-interest mailing list