[concurrency-interest] "parking to wait for" hangs forever

Adrian Tarau adrian.tarau at gmail.com
Fri Jan 14 11:16:43 EST 2011


Hi,

For a long time I wanted to ask this question but now it really burns 
me. This lock happens in JGroups but I know I've seen it before in my 
own code too.

This is how it is used in JGroups:
1. Locks creation
     /** Lock protecting sent credits table and some other vars 
(creditors for example) */
     private final Lock sent_lock=new ReentrantLock();

     /** Lock protecting received credits table */
     private final Lock received_lock=new ReentrantLock();


     /** Mutex to block on down() */
     private final Condition credits_available=sent_lock.newCondition();

2. Locks usage

                             long block_time=max_block_time;
                             if(max_block_times != null) {
                                 Long tmp=end_time.get();
                                 if(tmp != null) {
                                     // Negative value means we don't 
wait at all ! If the end_time already elapsed
                                     // (because we waited for other 
threads to get processed), the message will not
                                     // block at all and get sent 
immediately
                                     block_time=tmp - start_blocking;
                                 }
                             }

                             boolean 
rc=credits_available.await(block_time, TimeUnit.MILLISECONDS); <- here 
it locks



I do not understand why it locks...Other than this particular 
usage(JGroups), I've seen it before in ThreadPoolExecutor(locked in the 
blocking  queue used by the executor). Strange enough I think it happens 
when the JVM Heap runs out of memory or it is closed to run out of 
memory and GC kicks in and runs close to 80-90% CPU usage.

What does the "parking to wait for" mean?

Thanks.


"Channel default Messaging" - Thread t at 29
    java.lang.Thread.State: TIMED_WAITING
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <189fc4d> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
     at org.jgroups.protocols.FC.handleDownMessage(FC.java:549)
     at org.jgroups.protocols.FC.down(FC.java:423)
     at org.jgroups.protocols.FRAG2.down(FRAG2.java:154)
     at 
org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:215)
     at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:894)
     at org.jgroups.JChannel.down(JChannel.java:1623)
     at org.jgroups.JChannel.send(JChannel.java:724)
     at 
com.daxtechnologies.services.cluster.ChannelImpl.doSendMessage(ChannelImpl.java:316)
     at 
com.daxtechnologies.services.cluster.ChannelImpl.access$700(ChannelImpl.java:33)
     at 
com.daxtechnologies.services.cluster.ChannelImpl$MessageSenderWorker.run(ChannelImpl.java:678)

    Locked ownable synchronizers:
     - None


On 01/11/2011 06:03 PM, David Holmes wrote:
>> We don't see these files, is there another way to check this?
> As Mark says your kernel may be too old to expose these.
>
>> Also this the log that might tell you more:
>> [11:16:23][root at numpintapp15:~]$ grep -i tsc /var/log/messages
>> Jan 10 16:51:18 numpintapp15 checking TSC synchronization across
>> 12 CPUs: passed.
> I'm very skeptical that an old kernel will be able to synchronize the TSC
> across 12 cores. Linux abandoned use of the TSC as the primary timesource
> for MP systems.
>
> http://lwn.net/Articles/209101/
>
> But maybe things have progressed since then. I know Solaris went to great
> lengths to do the TSC synchronization, and while it seems reasonable today
> there were a few bumps along the road.
>
>> We've tried to specify the clocksource in the
>> /boot/grub/grub.conf file.  We've been trying various
>> clocksources. Would you let us know how to set the appropriate
>> clocksource? Below is the grub.conf file.
> I'm not a linux expert but it appears that clock=hpet as the appropriate
> boot option should work. These log entries indicate tsc is used:
>
> Jan 10 18:44:16 numpintapp15 Bootdata ok (command line is ro root=LABEL=/
> hda=ide-scsi hpet=disable pnpacpi=off clock=tsc)
> Jan 10 18:44:16 numpintapp15 Kernel command line: ro root=LABEL=/
> hda=ide-scsi hpet=disable pnpacpi=off clock=tsc console=tty0
>
> David
> ------
>
>> [11:17:43][root at numpintapp15:~]$ cat /boot/grub/grub.conf
>> # grub.conf generated by anaconda
>> #
>> # Note that you do not have to rerun grub after making changes to
>> this file
>> # NOTICE:  You have a /boot partition.  This means that
>> #          all kernel and initrd paths are relative to /boot/, eg.
>> #          root (hd0,0)
>> #          kernel /vmlinuz-version ro root=/dev/cciss/c0d0p6
>> #          initrd /initrd-version.img
>> #boot=/dev/cciss/c0d0
>> default=1
>> timeout=5
>> splashimage=(hd0,0)/grub/splash.xpm.gz
>> hiddenmenu
>> title Red Hat Enterprise Linux AS (2.6.9-78.0.8.EL)
>>          root (hd0,0)
>>          kernel /vmlinuz-2.6.9-78.0.8.EL ro root=LABEL=/
>> hda=ide-scsi clock=hpet clocksource=hpet
>>          initrd /initrd-2.6.9-78.0.8.EL.img
>> title Red Hat Enterprise Linux AS (2.6.9-78.ELlargesmp)
>>          root (hd0,0)
>>          kernel /vmlinuz-2.6.9-78.ELlargesmp ro root=LABEL=/
>> hda=ide-scsi hpet=disable pnpacpi=off clock=tsc
>>          initrd /initrd-2.6.9-78.ELlargesmp.img
>> title Red Hat Enterprise Linux AS-up (2.6.9-78.EL)
>>          root (hd0,0)
>>          kernel /vmlinuz-2.6.9-78.EL ro root=LABEL=/ hda=ide-scsi
>> clock=hpet clocksource=hpet
>>          initrd /initrd-2.6.9-78.EL.img
>>
>>
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:davidcholmes at aapt.net.au]
>> Sent: Monday, January 10, 2011 6:12 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>> I believe you need root access to check the clocksource (unless default
>> perms are modified):
>>
>> cat /sys/devices/system/clocksource/clocksource0/available_clocksource
>>
>> to see what's available, and
>>
>> cat /sys/devices/system/clocksource/clocksource0/current_clocksource
>>
>> to see what is in use.
>>
>> I was suggesting using nanoTime to measure the elapsed time, rather than
>> date/currentTimeMillis, as nanoTime should not be affected by time-of-day
>> adjustments. Which reminds me: check if ntp is running and if so turn it
>> off, if not turn it on, and see if that makes a difference.
>>
>> David
>>
>> -----Original Message-----
>> From: Navin Jha [mailto:navin.jha at FXALL.com]
>> Sent: Tuesday, 11 January 2011 9:04 AM
>> To: dholmes at ieee.org
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>>
>> David,
>>
>> I tried LockSupport.park(..) and ScheduledThreadPoolExecutor with same
>> results and they use nano time. Are you suggesting using nano time in any
>> other way? I will try them again after the switch to HPET to see
>> if there is
>> any difference. I have communicated your concern  about the use of TSC on
>> multi-processor systems to our unix support group.
>>
>> One more question, how do I make sure that the clock source is set up
>> correctly to HPET? Since it was done by the support group  I want to make
>> sure that they did it  correctly?
>>
>> -Navin
>>
>> From: David Holmes [mailto:davidcholmes at aapt.net.au]
>> Sent: Monday, January 10, 2011 5:41 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>> Try using System.nanoTime to track things.
>>
>> BTW unless you have CPUs with reliable TSC then you should never
>> use TSC as
>> the clocksource on multi-processor systems. There are some OS specific
>> utilities from the chip vendors to fix TSC drift but I don't know what is
>> available for Linux. It's been a little while since I checked on
>> the current
>> state of this.
>>
>> David
>> -----Original Message-----
>> From: Navin Jha [mailto:navin.jha at FXALL.com]
>> Sent: Tuesday, 11 January 2011 8:33 AM
>> To: dholmes at ieee.org
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>> David,
>> TSC was used by default (according to unix support folks, since I
>> don't have
>> access to configs on those machine), they switched it to use HPET
>> and I got
>> the same result. Below is the sample code I used to test. On one machine I
>> see a constant drift of about 15 milliseconds regardless of how
>> of long the
>> timer runs (I ran it for 1 hr, 2 hrs and 24hrs). This machine has
>> 2 cpus. On
>> the machine with 8 cpus The drift was close to 11 seconds for 24 hrs.
>>
>> import java.text.SimpleDateFormat;
>> import java.util.Calendar;
>> import java.util.Timer;
>> import java.util.TimerTask;
>>
>> public class TimerTest {
>>
>>           public static void main(String[] args) {
>>
>>                 try {
>>                    long schedule=500;
>>                    if (args.length>  0)
>>                          schedule=Long.parseLong(args[0]);
>>                    System.out.println( " scheduling job\n");
>>                    Timer eodTimer = new Timer(true);
>>                    SimpleDateFormat simpleDateFormat = new
>> SimpleDateFormat("HH:mm:ss:S");
>>                    System.out.println(" Current time = " +
>> simpleDateFormat.format(Calendar.getInstance().getTime()));
>>                    eodTimer.scheduleAtFixedRate(new
>> TestTask(System.currentTimeMillis(),schedule, simpleDateFormat), schedule,
>> schedule);
>>                    Thread.sleep(Long.MAX_VALUE);
>>              } catch (Exception e) {
>>                System.out.println(e);
>>              }
>>
>>        }
>>
>>           private static class TestTask extends TimerTask {
>>
>>            long startTime;
>>            long sch;
>>            SimpleDateFormat simpleDateFormat;
>>            TestTask (long startTime,long schedule, SimpleDateFormat
>> simpleDateFormat)
>>            {
>>              this.startTime=startTime;
>>              sch=schedule;
>>              this.simpleDateFormat = simpleDateFormat;
>>            }
>>              public void run() {
>>                    long now = System.currentTimeMillis();
>>                    System.out.println(" End time     = " +
>> simpleDateFormat.format(Calendar.getInstance().getTime()) + "\n");
>>                    System.out.println(" Actual diff  = "+(now-startTime)+ "
>> milliseconds, expected diff = " + sch + " milliseconds");
>>                    System.exit(1);
>>
>>              }
>>           }
>> }
>>
>> From: David Holmes [mailto:davidcholmes at aapt.net.au]
>> Sent: Monday, January 10, 2011 5:21 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>> So I take it that HPET is already being used.
>>
>> The other possibility here is that it is not the timer that is
>> drifting but
>> the time source that you are using to measure/track when the timer fires.
>> How are you tracking that?
>>
>> David
>> -----Original Message-----
>> From: Navin Jha [mailto:navin.jha at FXALL.com]
>> Sent: Tuesday, 11 January 2011 8:11 AM
>> To: dholmes at ieee.org
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>> No luck L
>>
>> Does having too many cpus effect this in anyway?
>>
>> The machine on which a sample test code works only has 2 cpus while the
>> machine on which we see a huge drift has 8 cpus.
>>
>> From: David Holmes [mailto:davidcholmes at aapt.net.au]
>> Sent: Monday, January 10, 2011 3:46 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>> Check what clocksource the problematic system is using. If it is TSC then
>> switch to HPET.
>>
>> These things are difficult to diagnoze.
>>
>> David Holmes
>> -----Original Message-----
>> From: concurrency-interest-bounces at cs.oswego.edu
>> [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Navin Jha
>> Sent: Tuesday, 11 January 2011 4:30 AM
>> To: Attila Szegedi
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: Re: [concurrency-interest] Timer notification drifts
>>  From what I have learned so far is that this is a common problem but the
>> drift is too much for us since it effects our trading date rollover J
>>
>> David Holmes has nice blog on clocks
>> (http://blogs.sun.com/dholmes/entry/inside_the_hotspot_vm_clocks)
>> and in his
>> blog he suggested to someone with a similar problem that he
>> should post the
>> problem here.
>>
>> From: Attila Szegedi [mailto:szegedia at gmail.com]
>> Sent: Monday, January 10, 2011 1:24 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: Re: [concurrency-interest] Timer notification drifts
>>
>> I see - you weren't specific about which API call you use, so I wanted to
>> root out the rookie mistake :-) Hm. a drift should definitely not
>> occur with
>> scheduleAtFixedRate, as far as I can tell. At this point, this
>> indeed start
>> to sound as a topic relevant for this group :-) Although I can't you help
>> past this stage (not much Linux system expertise), I guess
>> whoever wants to
>> look into this will want the JRE, Linux, and CPU versions of your system.
>>
>> On Jan 10, 2011, at 10:09 AM, Navin Jha wrote:
>>
>> This is exactly what we use. A sample code we tried works fine of
>> some linux
>> machines with a constant lag value (say 15 milliseconds) but
>> fails on other
>> linux machines. We are trying to find if there is something about those
>> linux machines that causes this. The machines on which this is happening
>> ironically have much better hardware (high end multi-core linux servers).
>>
>> From: Attila Szegedi [mailto:szegedia at gmail.com]
>> Sent: Monday, January 10, 2011 1:03 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: Re: [concurrency-interest] Timer notification drifts
>>
>> scheduleAtFixedRate should help:
>> <http://download.oracle.com/javase/1.4.2/docs/api/java/util/Timer.
> html#sched
> uleAtFixedRate(java.util.TimerTask,%20java.util.Date,%20long)>
>
> On Jan 10, 2011, at 9:32 AM, Navin Jha wrote:
>
> Hi,
>
> Not sure if this is the right place to post this problem. We use
> java.util.Timer class for a notification that needs to happens every 24
> hours. We noticed that on some linux multi-core servers the notification
> occurs almost 11 seconds later. If we run for successive smaller durations
> say 1 hour, 2 hours, 3 hours. we notice that the lag does accumulate. So for
> 1 hour it is 600 milliseconds, for 2 hours it is 1.2 seconds etc..
>
> The only solutions we can think of right now is to run the timer for smaller
> duration and restart it after that duration.
>
> Is there a solution/workaround for this problem?
>
> Regards,
> Navin
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest



More information about the Concurrency-interest mailing list