[concurrency-interest] "parking to wait for" hangs forever
Adrian Tarau
adrian.tarau at gmail.com
Fri Jan 14 11:16:43 EST 2011
Hi,
For a long time I wanted to ask this question but now it really burns
me. This lock happens in JGroups but I know I've seen it before in my
own code too.
This is how it is used in JGroups:
1. Locks creation
/** Lock protecting sent credits table and some other vars
(creditors for example) */
private final Lock sent_lock=new ReentrantLock();
/** Lock protecting received credits table */
private final Lock received_lock=new ReentrantLock();
/** Mutex to block on down() */
private final Condition credits_available=sent_lock.newCondition();
2. Locks usage
long block_time=max_block_time;
if(max_block_times != null) {
Long tmp=end_time.get();
if(tmp != null) {
// Negative value means we don't
wait at all ! If the end_time already elapsed
// (because we waited for other
threads to get processed), the message will not
// block at all and get sent
immediately
block_time=tmp - start_blocking;
}
}
boolean
rc=credits_available.await(block_time, TimeUnit.MILLISECONDS); <- here
it locks
I do not understand why it locks...Other than this particular
usage(JGroups), I've seen it before in ThreadPoolExecutor(locked in the
blocking queue used by the executor). Strange enough I think it happens
when the JVM Heap runs out of memory or it is closed to run out of
memory and GC kicks in and runs close to 80-90% CPU usage.
What does the "parking to wait for" mean?
Thanks.
"Channel default Messaging" - Thread t at 29
java.lang.Thread.State: TIMED_WAITING
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <189fc4d> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
at org.jgroups.protocols.FC.handleDownMessage(FC.java:549)
at org.jgroups.protocols.FC.down(FC.java:423)
at org.jgroups.protocols.FRAG2.down(FRAG2.java:154)
at
org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:215)
at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:894)
at org.jgroups.JChannel.down(JChannel.java:1623)
at org.jgroups.JChannel.send(JChannel.java:724)
at
com.daxtechnologies.services.cluster.ChannelImpl.doSendMessage(ChannelImpl.java:316)
at
com.daxtechnologies.services.cluster.ChannelImpl.access$700(ChannelImpl.java:33)
at
com.daxtechnologies.services.cluster.ChannelImpl$MessageSenderWorker.run(ChannelImpl.java:678)
Locked ownable synchronizers:
- None
On 01/11/2011 06:03 PM, David Holmes wrote:
>> We don't see these files, is there another way to check this?
> As Mark says your kernel may be too old to expose these.
>
>> Also this the log that might tell you more:
>> [11:16:23][root at numpintapp15:~]$ grep -i tsc /var/log/messages
>> Jan 10 16:51:18 numpintapp15 checking TSC synchronization across
>> 12 CPUs: passed.
> I'm very skeptical that an old kernel will be able to synchronize the TSC
> across 12 cores. Linux abandoned use of the TSC as the primary timesource
> for MP systems.
>
> http://lwn.net/Articles/209101/
>
> But maybe things have progressed since then. I know Solaris went to great
> lengths to do the TSC synchronization, and while it seems reasonable today
> there were a few bumps along the road.
>
>> We've tried to specify the clocksource in the
>> /boot/grub/grub.conf file. We've been trying various
>> clocksources. Would you let us know how to set the appropriate
>> clocksource? Below is the grub.conf file.
> I'm not a linux expert but it appears that clock=hpet as the appropriate
> boot option should work. These log entries indicate tsc is used:
>
> Jan 10 18:44:16 numpintapp15 Bootdata ok (command line is ro root=LABEL=/
> hda=ide-scsi hpet=disable pnpacpi=off clock=tsc)
> Jan 10 18:44:16 numpintapp15 Kernel command line: ro root=LABEL=/
> hda=ide-scsi hpet=disable pnpacpi=off clock=tsc console=tty0
>
> David
> ------
>
>> [11:17:43][root at numpintapp15:~]$ cat /boot/grub/grub.conf
>> # grub.conf generated by anaconda
>> #
>> # Note that you do not have to rerun grub after making changes to
>> this file
>> # NOTICE: You have a /boot partition. This means that
>> # all kernel and initrd paths are relative to /boot/, eg.
>> # root (hd0,0)
>> # kernel /vmlinuz-version ro root=/dev/cciss/c0d0p6
>> # initrd /initrd-version.img
>> #boot=/dev/cciss/c0d0
>> default=1
>> timeout=5
>> splashimage=(hd0,0)/grub/splash.xpm.gz
>> hiddenmenu
>> title Red Hat Enterprise Linux AS (2.6.9-78.0.8.EL)
>> root (hd0,0)
>> kernel /vmlinuz-2.6.9-78.0.8.EL ro root=LABEL=/
>> hda=ide-scsi clock=hpet clocksource=hpet
>> initrd /initrd-2.6.9-78.0.8.EL.img
>> title Red Hat Enterprise Linux AS (2.6.9-78.ELlargesmp)
>> root (hd0,0)
>> kernel /vmlinuz-2.6.9-78.ELlargesmp ro root=LABEL=/
>> hda=ide-scsi hpet=disable pnpacpi=off clock=tsc
>> initrd /initrd-2.6.9-78.ELlargesmp.img
>> title Red Hat Enterprise Linux AS-up (2.6.9-78.EL)
>> root (hd0,0)
>> kernel /vmlinuz-2.6.9-78.EL ro root=LABEL=/ hda=ide-scsi
>> clock=hpet clocksource=hpet
>> initrd /initrd-2.6.9-78.EL.img
>>
>>
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:davidcholmes at aapt.net.au]
>> Sent: Monday, January 10, 2011 6:12 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>> I believe you need root access to check the clocksource (unless default
>> perms are modified):
>>
>> cat /sys/devices/system/clocksource/clocksource0/available_clocksource
>>
>> to see what's available, and
>>
>> cat /sys/devices/system/clocksource/clocksource0/current_clocksource
>>
>> to see what is in use.
>>
>> I was suggesting using nanoTime to measure the elapsed time, rather than
>> date/currentTimeMillis, as nanoTime should not be affected by time-of-day
>> adjustments. Which reminds me: check if ntp is running and if so turn it
>> off, if not turn it on, and see if that makes a difference.
>>
>> David
>>
>> -----Original Message-----
>> From: Navin Jha [mailto:navin.jha at FXALL.com]
>> Sent: Tuesday, 11 January 2011 9:04 AM
>> To: dholmes at ieee.org
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>>
>> David,
>>
>> I tried LockSupport.park(..) and ScheduledThreadPoolExecutor with same
>> results and they use nano time. Are you suggesting using nano time in any
>> other way? I will try them again after the switch to HPET to see
>> if there is
>> any difference. I have communicated your concern about the use of TSC on
>> multi-processor systems to our unix support group.
>>
>> One more question, how do I make sure that the clock source is set up
>> correctly to HPET? Since it was done by the support group I want to make
>> sure that they did it correctly?
>>
>> -Navin
>>
>> From: David Holmes [mailto:davidcholmes at aapt.net.au]
>> Sent: Monday, January 10, 2011 5:41 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>> Try using System.nanoTime to track things.
>>
>> BTW unless you have CPUs with reliable TSC then you should never
>> use TSC as
>> the clocksource on multi-processor systems. There are some OS specific
>> utilities from the chip vendors to fix TSC drift but I don't know what is
>> available for Linux. It's been a little while since I checked on
>> the current
>> state of this.
>>
>> David
>> -----Original Message-----
>> From: Navin Jha [mailto:navin.jha at FXALL.com]
>> Sent: Tuesday, 11 January 2011 8:33 AM
>> To: dholmes at ieee.org
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>> David,
>> TSC was used by default (according to unix support folks, since I
>> don't have
>> access to configs on those machine), they switched it to use HPET
>> and I got
>> the same result. Below is the sample code I used to test. On one machine I
>> see a constant drift of about 15 milliseconds regardless of how
>> of long the
>> timer runs (I ran it for 1 hr, 2 hrs and 24hrs). This machine has
>> 2 cpus. On
>> the machine with 8 cpus The drift was close to 11 seconds for 24 hrs.
>>
>> import java.text.SimpleDateFormat;
>> import java.util.Calendar;
>> import java.util.Timer;
>> import java.util.TimerTask;
>>
>> public class TimerTest {
>>
>> public static void main(String[] args) {
>>
>> try {
>> long schedule=500;
>> if (args.length> 0)
>> schedule=Long.parseLong(args[0]);
>> System.out.println( " scheduling job\n");
>> Timer eodTimer = new Timer(true);
>> SimpleDateFormat simpleDateFormat = new
>> SimpleDateFormat("HH:mm:ss:S");
>> System.out.println(" Current time = " +
>> simpleDateFormat.format(Calendar.getInstance().getTime()));
>> eodTimer.scheduleAtFixedRate(new
>> TestTask(System.currentTimeMillis(),schedule, simpleDateFormat), schedule,
>> schedule);
>> Thread.sleep(Long.MAX_VALUE);
>> } catch (Exception e) {
>> System.out.println(e);
>> }
>>
>> }
>>
>> private static class TestTask extends TimerTask {
>>
>> long startTime;
>> long sch;
>> SimpleDateFormat simpleDateFormat;
>> TestTask (long startTime,long schedule, SimpleDateFormat
>> simpleDateFormat)
>> {
>> this.startTime=startTime;
>> sch=schedule;
>> this.simpleDateFormat = simpleDateFormat;
>> }
>> public void run() {
>> long now = System.currentTimeMillis();
>> System.out.println(" End time = " +
>> simpleDateFormat.format(Calendar.getInstance().getTime()) + "\n");
>> System.out.println(" Actual diff = "+(now-startTime)+ "
>> milliseconds, expected diff = " + sch + " milliseconds");
>> System.exit(1);
>>
>> }
>> }
>> }
>>
>> From: David Holmes [mailto:davidcholmes at aapt.net.au]
>> Sent: Monday, January 10, 2011 5:21 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>> So I take it that HPET is already being used.
>>
>> The other possibility here is that it is not the timer that is
>> drifting but
>> the time source that you are using to measure/track when the timer fires.
>> How are you tracking that?
>>
>> David
>> -----Original Message-----
>> From: Navin Jha [mailto:navin.jha at FXALL.com]
>> Sent: Tuesday, 11 January 2011 8:11 AM
>> To: dholmes at ieee.org
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>> No luck L
>>
>> Does having too many cpus effect this in anyway?
>>
>> The machine on which a sample test code works only has 2 cpus while the
>> machine on which we see a huge drift has 8 cpus.
>>
>> From: David Holmes [mailto:davidcholmes at aapt.net.au]
>> Sent: Monday, January 10, 2011 3:46 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: RE: [concurrency-interest] Timer notification drifts
>>
>> Check what clocksource the problematic system is using. If it is TSC then
>> switch to HPET.
>>
>> These things are difficult to diagnoze.
>>
>> David Holmes
>> -----Original Message-----
>> From: concurrency-interest-bounces at cs.oswego.edu
>> [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Navin Jha
>> Sent: Tuesday, 11 January 2011 4:30 AM
>> To: Attila Szegedi
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: Re: [concurrency-interest] Timer notification drifts
>> From what I have learned so far is that this is a common problem but the
>> drift is too much for us since it effects our trading date rollover J
>>
>> David Holmes has nice blog on clocks
>> (http://blogs.sun.com/dholmes/entry/inside_the_hotspot_vm_clocks)
>> and in his
>> blog he suggested to someone with a similar problem that he
>> should post the
>> problem here.
>>
>> From: Attila Szegedi [mailto:szegedia at gmail.com]
>> Sent: Monday, January 10, 2011 1:24 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: Re: [concurrency-interest] Timer notification drifts
>>
>> I see - you weren't specific about which API call you use, so I wanted to
>> root out the rookie mistake :-) Hm. a drift should definitely not
>> occur with
>> scheduleAtFixedRate, as far as I can tell. At this point, this
>> indeed start
>> to sound as a topic relevant for this group :-) Although I can't you help
>> past this stage (not much Linux system expertise), I guess
>> whoever wants to
>> look into this will want the JRE, Linux, and CPU versions of your system.
>>
>> On Jan 10, 2011, at 10:09 AM, Navin Jha wrote:
>>
>> This is exactly what we use. A sample code we tried works fine of
>> some linux
>> machines with a constant lag value (say 15 milliseconds) but
>> fails on other
>> linux machines. We are trying to find if there is something about those
>> linux machines that causes this. The machines on which this is happening
>> ironically have much better hardware (high end multi-core linux servers).
>>
>> From: Attila Szegedi [mailto:szegedia at gmail.com]
>> Sent: Monday, January 10, 2011 1:03 PM
>> To: Navin Jha
>> Cc: concurrency-interest at cs.oswego.edu
>> Subject: Re: [concurrency-interest] Timer notification drifts
>>
>> scheduleAtFixedRate should help:
>> <http://download.oracle.com/javase/1.4.2/docs/api/java/util/Timer.
> html#sched
> uleAtFixedRate(java.util.TimerTask,%20java.util.Date,%20long)>
>
> On Jan 10, 2011, at 9:32 AM, Navin Jha wrote:
>
> Hi,
>
> Not sure if this is the right place to post this problem. We use
> java.util.Timer class for a notification that needs to happens every 24
> hours. We noticed that on some linux multi-core servers the notification
> occurs almost 11 seconds later. If we run for successive smaller durations
> say 1 hour, 2 hours, 3 hours. we notice that the lag does accumulate. So for
> 1 hour it is 600 milliseconds, for 2 hours it is 1.2 seconds etc..
>
> The only solutions we can think of right now is to run the timer for smaller
> duration and restart it after that duration.
>
> Is there a solution/workaround for this problem?
>
> Regards,
> Navin
>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
More information about the Concurrency-interest
mailing list