[concurrency-interest] Low cpu usage with fair ReentrantLock

Andrew Haley aph at redhat.com
Mon May 10 12:36:28 EDT 2010


On 05/10/2010 05:24 PM, Tim Peierls wrote:
> Sounds like you observed behavior consistent with the ReentrantLock javadoc.
> In particular:
> 
> Programs using fair locks accessed by many threads may display lower overall
> throughput (i.e., are slower; often much slower) than those using the
> default setting, ...

Perhaps so.  It's hard to tell from that javadoc exactly what is
meant: I thought, from reading that, that "slower; often much slower"
referred to slow operation with high CPU loads when locking rather
than, as we have here, CPUs being starved of work.  But in any case,
given the fact that locking is so rare, I found this result a little
bit surprising.

I'm asking here because I want to know (from the horse's mouth, as it
were) whether this is exactly what people would expect of such a lock.
And if it is, perhaps I can add a litle to that javadoc.

Andrew.


> On Mon, May 10, 2010 at 10:30 AM, Andrew Haley <aph at redhat.com> wrote:
> 
>> Consider a set of identical threads that occasionally need to acquire
>> a lock.  Contention for this lock is low, because the ratio of
>> unlocked to locked work is high, 10000:1.
>>
>>  public void run()
>>  {
>>    while (!Main.done)
>>      {
>>        mylock.lock();
>>        try {
>>          doLockedWork();
>>        }
>>        finally {
>>          mylock.unlock();
>>        }
>>        doUnlockedWork();
>>        counter++;
>>      }
>>
>> I've observed that when a fair ReentrantLock is used, the CPU usage
>> across all cores becomes very low, and much of the time cores are
>> idling.  This seems to be particularly true on a multi-CPU machine,
>> whereas single-CPU machines aren't so bad.
>>
>> What seems to happen is that threads waiting to acquire the lock "pile
>> up" and are descheduled, and the number of threads doing work falls
>> dramatically.
>>
>> I added a logging facility inside park and unpark which tracks when
>> threads are sleeping waiting for pthread_cond_wait(), when they are
>> unparked by calling pthread_cond_signal(), and when they actually
>> return from pthread_cond_wait().  On an 8 core machine with 32
>> threads, the process eventually stabilizes with, on average, only a
>> few threads running at any given time.
>>
>> This is a typical sample of the behaviour of a fair ReentrantLock
>> after the program has been running for a while.  "r" is the number of
>> threads running, "s" the number of threads sleeping, and "a" the
>> number of threads that have been unparked but have not yet returned
>> from pthread_cond_wait():
>>
>> pid 29978: 2   r:5, s:28, a:0
>> pid 29986: 1   r:5, s:27, a:1
>> pid 29986: 2   r:6, s:27, a:0
>> pid 29987: 0   r:5, s:28, a:0
>> pid 29993: 1   r:5, s:27, a:1
>> pid 29995: 0   r:4, s:28, a:1
>> pid 29993: 2   r:5, s:28, a:0
>> pid 29974: 1   r:5, s:27, a:1
>> pid 29981: 0   r:4, s:28, a:1
>> pid 29978: 0   r:3, s:29, a:1
>> pid 29986: 0   r:2, s:30, a:1
>> pid 29993: 0   r:1, s:31, a:1
>> pid 29974: 2   r:2, s:31, a:0
>> pid 29989: 1   r:2, s:30, a:1
>> pid 29974: 0   r:1, s:31, a:1
>> pid 29989: 2   r:2, s:31, a:0
>> pid 29996: 1   r:2, s:30, a:1
>> pid 29996: 2   r:3, s:30, a:0
>> pid 29984: 1   r:3, s:29, a:1
>>
>> As you can see, the number of threads running at any given time is
>> low, and the machine is on average only about 50% loaded.  Most
>> threads are sleeping.
>>
>> With an unfair ReentrantLock, there are no such problems, and the
>> number of running threads is greater:
>>
>> pid 32169: 2   r:6, s:27, a:0
>> pid 32156: 1   r:6, s:26, a:1
>> pid 32156: 2   r:7, s:26, a:0
>> pid 32180: 1   r:7, s:25, a:1
>> pid 32180: 2   r:8, s:25, a:0
>> pid 32164: 1   r:8, s:24, a:1
>> pid 32180: 0   r:7, s:25, a:1
>> pid 32164: 2   r:8, s:25, a:0
>> pid 32159: 1   r:8, s:24, a:1
>> pid 32159: 2   r:9, s:24, a:0
>> pid 32150: 1   r:9, s:23, a:1
>> pid 32169: 0   r:8, s:24, a:1
>> pid 32172: 0   r:7, s:25, a:1
>> pid 32150: 2   r:8, s:25, a:0
>> pid 32155: 1   r:8, s:24, a:1
>>
>> This machine has 8 cores, so this load is sufficient to use almost
>> 100% CPU.
>>
>> Does anyone have any idea why a fair ReentrantLock should result in
>> such low CPU usage?  This pattern of usage has, apparently, been
>> observed in more realistic workloads, from which this test case has
>> been extracted.
>>
>> To run the example, just run "java Main N" where N is the number of
>> threads.
>>
>> I don't think this is the same problem as
>> http://blogs.sun.com/dave/entry/a_scheduling_and_dispatching_oddity
>> because tweaking the kernel policies doesn't help.  Also, the kernel
>> seems from my logs to be waking a thread quickly once it is sent a
>> pthread_cond_signal(): there is never more than one thread awakened
>> but not running.
>>
>> Andrew.
>>
>>
>>
>> import java.util.concurrent.locks.*;
>>
>> class MyThread extends Thread
>> {
>>  static int UNLOCKLOOPS = 10000;
>>  static int LOCKLOOPS = 1;
>>
>>  public long counter;
>>  int sum;
>>
>>  static public ReentrantLock mylock = new ReentrantLock(true);  // shared
>> by all threads
>>
>>  public void runX()
>>  {
>>    while (!Main.done)
>>      {
>>        synchronized(mylock)
>>          {
>>            doLockedWork();
>>          }
>>        doUnlockedWork();
>>        counter++;
>>      }
>>  }
>>  public void run()
>>  {
>>    while (!Main.done)
>>      {
>>        mylock.lock();
>>        try {
>>          doLockedWork();
>>        }
>>        finally {
>>          mylock.unlock();
>>        }
>>        doUnlockedWork();
>>        counter++;
>>      }
>>  }
>>  void doLockedWork() {
>>    doComputeWork(LOCKLOOPS);
>>  }
>>
>>  void doUnlockedWork() {
>>    doComputeWork(UNLOCKLOOPS);
>>  }
>>
>>  void doComputeWork(int loopCount) {
>>    int mysum = sum;
>>    for (int i=0; i<loopCount; i++) {
>>      mysum += compute6(mysum);
>>    }
>>    sum += mysum;   // write back to field
>>  }
>>
>>
>>
>>
>>
>>  /**
>>   * Marsaglia xorshift (1, 3, 10)
>>   */
>>  public static int compute6(int seed) {
>>    seed ^= seed << 1;
>>    seed ^= seed >>> 3;
>>    seed ^= (seed << 10);
>>    return seed;
>>  }
>>
>>
>> }
>>
>> public class Main
>> {
>>  static volatile boolean done = false;
>>  static ReentrantLock mylock = new ReentrantLock();  // shared by all
>> threads
>>
>>  public static Thread[] threads;
>>
>>  public static void main(String[] args)
>>    throws Exception
>>  {
>>    int threadCount = Integer.parseInt(args[0]);
>>
>>    MyThread[] threads = new MyThread[threadCount];
>>
>>    for (int i = 0; i < threads.length; i++)
>>      {
>>        threads[i] = new MyThread();
>>        threads[i].start();
>>      }
>>
>>    Thread.sleep(10000);
>>    done = true;
>>
>>    long total = 0;
>>    for (int i = 0; i < threads.length; i++)
>>      {
>>        total += threads[i].counter;
>> //      System.out.println(threads[i].counter);
>>      }
>>    System.out.println(threadCount + " threads, throughput = " + total);
>>  }
>> }
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
> 



More information about the Concurrency-interest mailing list