[concurrency-interest] Low cpu usage with fair ReentrantLock

Andrew Haley aph at redhat.com
Mon May 10 10:30:01 EDT 2010


Consider a set of identical threads that occasionally need to acquire
a lock.  Contention for this lock is low, because the ratio of
unlocked to locked work is high, 10000:1.

  public void run()
  {
    while (!Main.done)
      {
        mylock.lock();
        try {
          doLockedWork();
        }
        finally {
          mylock.unlock();
        }
        doUnlockedWork();
        counter++;
      }

I've observed that when a fair ReentrantLock is used, the CPU usage
across all cores becomes very low, and much of the time cores are
idling.  This seems to be particularly true on a multi-CPU machine,
whereas single-CPU machines aren't so bad.

What seems to happen is that threads waiting to acquire the lock "pile
up" and are descheduled, and the number of threads doing work falls
dramatically.

I added a logging facility inside park and unpark which tracks when
threads are sleeping waiting for pthread_cond_wait(), when they are
unparked by calling pthread_cond_signal(), and when they actually
return from pthread_cond_wait().  On an 8 core machine with 32
threads, the process eventually stabilizes with, on average, only a
few threads running at any given time.

This is a typical sample of the behaviour of a fair ReentrantLock
after the program has been running for a while.  "r" is the number of
threads running, "s" the number of threads sleeping, and "a" the
number of threads that have been unparked but have not yet returned
from pthread_cond_wait():

pid 29978: 2   r:5, s:28, a:0
pid 29986: 1   r:5, s:27, a:1
pid 29986: 2   r:6, s:27, a:0
pid 29987: 0   r:5, s:28, a:0
pid 29993: 1   r:5, s:27, a:1
pid 29995: 0   r:4, s:28, a:1
pid 29993: 2   r:5, s:28, a:0
pid 29974: 1   r:5, s:27, a:1
pid 29981: 0   r:4, s:28, a:1
pid 29978: 0   r:3, s:29, a:1
pid 29986: 0   r:2, s:30, a:1
pid 29993: 0   r:1, s:31, a:1
pid 29974: 2   r:2, s:31, a:0
pid 29989: 1   r:2, s:30, a:1
pid 29974: 0   r:1, s:31, a:1
pid 29989: 2   r:2, s:31, a:0
pid 29996: 1   r:2, s:30, a:1
pid 29996: 2   r:3, s:30, a:0
pid 29984: 1   r:3, s:29, a:1

As you can see, the number of threads running at any given time is
low, and the machine is on average only about 50% loaded.  Most
threads are sleeping.

With an unfair ReentrantLock, there are no such problems, and the
number of running threads is greater:

pid 32169: 2   r:6, s:27, a:0
pid 32156: 1   r:6, s:26, a:1
pid 32156: 2   r:7, s:26, a:0
pid 32180: 1   r:7, s:25, a:1
pid 32180: 2   r:8, s:25, a:0
pid 32164: 1   r:8, s:24, a:1
pid 32180: 0   r:7, s:25, a:1
pid 32164: 2   r:8, s:25, a:0
pid 32159: 1   r:8, s:24, a:1
pid 32159: 2   r:9, s:24, a:0
pid 32150: 1   r:9, s:23, a:1
pid 32169: 0   r:8, s:24, a:1
pid 32172: 0   r:7, s:25, a:1
pid 32150: 2   r:8, s:25, a:0
pid 32155: 1   r:8, s:24, a:1

This machine has 8 cores, so this load is sufficient to use almost
100% CPU.

Does anyone have any idea why a fair ReentrantLock should result in
such low CPU usage?  This pattern of usage has, apparently, been
observed in more realistic workloads, from which this test case has
been extracted.

To run the example, just run "java Main N" where N is the number of
threads.

I don't think this is the same problem as
http://blogs.sun.com/dave/entry/a_scheduling_and_dispatching_oddity
because tweaking the kernel policies doesn't help.  Also, the kernel
seems from my logs to be waking a thread quickly once it is sent a 
pthread_cond_signal(): there is never more than one thread awakened
but not running.

Andrew.



import java.util.concurrent.locks.*;

class MyThread extends Thread
{
  static int UNLOCKLOOPS = 10000;
  static int LOCKLOOPS = 1;

  public long counter;
  int sum;

  static public ReentrantLock mylock = new ReentrantLock(true);  // shared by all threads

  public void runX()
  {
    while (!Main.done)
      {
	synchronized(mylock)
	  {
	    doLockedWork();
	  }
        doUnlockedWork();
        counter++;
      }
  }
  public void run()
  {
    while (!Main.done)
      {
        mylock.lock();
        try {
          doLockedWork();
        }
        finally {
          mylock.unlock();
        }
        doUnlockedWork();
        counter++;
      }
  }
  void doLockedWork() {
    doComputeWork(LOCKLOOPS); 
  }

  void doUnlockedWork() {
    doComputeWork(UNLOCKLOOPS);
  }

  void doComputeWork(int loopCount) {
    int mysum = sum;
    for (int i=0; i<loopCount; i++) {
      mysum += compute6(mysum);
    }
    sum += mysum;   // write back to field
  }
  


  

  /**
   * Marsaglia xorshift (1, 3, 10)
   */
  public static int compute6(int seed) {
    seed ^= seed << 1;
    seed ^= seed >>> 3;
    seed ^= (seed << 10);
    return seed;
  }


}

public class Main
{
  static volatile boolean done = false;
  static ReentrantLock mylock = new ReentrantLock();  // shared by all threads

  public static Thread[] threads;

  public static void main(String[] args)
    throws Exception
  {
    int threadCount = Integer.parseInt(args[0]);

    MyThread[] threads = new MyThread[threadCount];

    for (int i = 0; i < threads.length; i++)
      {
        threads[i] = new MyThread();
        threads[i].start();
      }

    Thread.sleep(10000);
    done = true;
    
    long total = 0;
    for (int i = 0; i < threads.length; i++)
      {
        total += threads[i].counter;
//      System.out.println(threads[i].counter);
      }
    System.out.println(threadCount + " threads, throughput = " + total);
  }
}


More information about the Concurrency-interest mailing list