[concurrency-interest] Low cpu usage with fair ReentrantLock

David Holmes davidcholmes at aapt.net.au
Mon May 10 18:42:54 EDT 2010


Andrew Haley writes on Tuesday, 11 May 2010 2:36 AM:
> Perhaps so.  It's hard to tell from that javadoc exactly what is
> meant: I thought, from reading that, that "slower; often much slower"
> referred to slow operation with high CPU loads when locking rather
> than, as we have here, CPUs being starved of work.  But in any case,
> given the fact that locking is so rare, I found this result a little
> bit surprising.

Just one further comment. Locking may be "rare" from the perspective of
locked-work:unlocked-work, but it would seem that your threads are executing
in-phase, and so contention is actually high.

David Holmes

> I'm asking here because I want to know (from the horse's mouth, as it
> were) whether this is exactly what people would expect of such a lock.
> And if it is, perhaps I can add a litle to that javadoc.
>
> Andrew.
>
>
> > On Mon, May 10, 2010 at 10:30 AM, Andrew Haley <aph at redhat.com> wrote:
> >
> >> Consider a set of identical threads that occasionally need to acquire
> >> a lock.  Contention for this lock is low, because the ratio of
> >> unlocked to locked work is high, 10000:1.
> >>
> >>  public void run()
> >>  {
> >>    while (!Main.done)
> >>      {
> >>        mylock.lock();
> >>        try {
> >>          doLockedWork();
> >>        }
> >>        finally {
> >>          mylock.unlock();
> >>        }
> >>        doUnlockedWork();
> >>        counter++;
> >>      }
> >>
> >> I've observed that when a fair ReentrantLock is used, the CPU usage
> >> across all cores becomes very low, and much of the time cores are
> >> idling.  This seems to be particularly true on a multi-CPU machine,
> >> whereas single-CPU machines aren't so bad.
> >>
> >> What seems to happen is that threads waiting to acquire the lock "pile
> >> up" and are descheduled, and the number of threads doing work falls
> >> dramatically.
> >>
> >> I added a logging facility inside park and unpark which tracks when
> >> threads are sleeping waiting for pthread_cond_wait(), when they are
> >> unparked by calling pthread_cond_signal(), and when they actually
> >> return from pthread_cond_wait().  On an 8 core machine with 32
> >> threads, the process eventually stabilizes with, on average, only a
> >> few threads running at any given time.
> >>
> >> This is a typical sample of the behaviour of a fair ReentrantLock
> >> after the program has been running for a while.  "r" is the number of
> >> threads running, "s" the number of threads sleeping, and "a" the
> >> number of threads that have been unparked but have not yet returned
> >> from pthread_cond_wait():
> >>
> >> pid 29978: 2   r:5, s:28, a:0
> >> pid 29986: 1   r:5, s:27, a:1
> >> pid 29986: 2   r:6, s:27, a:0
> >> pid 29987: 0   r:5, s:28, a:0
> >> pid 29993: 1   r:5, s:27, a:1
> >> pid 29995: 0   r:4, s:28, a:1
> >> pid 29993: 2   r:5, s:28, a:0
> >> pid 29974: 1   r:5, s:27, a:1
> >> pid 29981: 0   r:4, s:28, a:1
> >> pid 29978: 0   r:3, s:29, a:1
> >> pid 29986: 0   r:2, s:30, a:1
> >> pid 29993: 0   r:1, s:31, a:1
> >> pid 29974: 2   r:2, s:31, a:0
> >> pid 29989: 1   r:2, s:30, a:1
> >> pid 29974: 0   r:1, s:31, a:1
> >> pid 29989: 2   r:2, s:31, a:0
> >> pid 29996: 1   r:2, s:30, a:1
> >> pid 29996: 2   r:3, s:30, a:0
> >> pid 29984: 1   r:3, s:29, a:1
> >>
> >> As you can see, the number of threads running at any given time is
> >> low, and the machine is on average only about 50% loaded.  Most
> >> threads are sleeping.
> >>
> >> With an unfair ReentrantLock, there are no such problems, and the
> >> number of running threads is greater:
> >>
> >> pid 32169: 2   r:6, s:27, a:0
> >> pid 32156: 1   r:6, s:26, a:1
> >> pid 32156: 2   r:7, s:26, a:0
> >> pid 32180: 1   r:7, s:25, a:1
> >> pid 32180: 2   r:8, s:25, a:0
> >> pid 32164: 1   r:8, s:24, a:1
> >> pid 32180: 0   r:7, s:25, a:1
> >> pid 32164: 2   r:8, s:25, a:0
> >> pid 32159: 1   r:8, s:24, a:1
> >> pid 32159: 2   r:9, s:24, a:0
> >> pid 32150: 1   r:9, s:23, a:1
> >> pid 32169: 0   r:8, s:24, a:1
> >> pid 32172: 0   r:7, s:25, a:1
> >> pid 32150: 2   r:8, s:25, a:0
> >> pid 32155: 1   r:8, s:24, a:1
> >>
> >> This machine has 8 cores, so this load is sufficient to use almost
> >> 100% CPU.
> >>
> >> Does anyone have any idea why a fair ReentrantLock should result in
> >> such low CPU usage?  This pattern of usage has, apparently, been
> >> observed in more realistic workloads, from which this test case has
> >> been extracted.
> >>
> >> To run the example, just run "java Main N" where N is the number of
> >> threads.
> >>
> >> I don't think this is the same problem as
> >> http://blogs.sun.com/dave/entry/a_scheduling_and_dispatching_oddity
> >> because tweaking the kernel policies doesn't help.  Also, the kernel
> >> seems from my logs to be waking a thread quickly once it is sent a
> >> pthread_cond_signal(): there is never more than one thread awakened
> >> but not running.
> >>
> >> Andrew.
> >>
> >>
> >>
> >> import java.util.concurrent.locks.*;
> >>
> >> class MyThread extends Thread
> >> {
> >>  static int UNLOCKLOOPS = 10000;
> >>  static int LOCKLOOPS = 1;
> >>
> >>  public long counter;
> >>  int sum;
> >>
> >>  static public ReentrantLock mylock = new ReentrantLock(true);
>  // shared
> >> by all threads
> >>
> >>  public void runX()
> >>  {
> >>    while (!Main.done)
> >>      {
> >>        synchronized(mylock)
> >>          {
> >>            doLockedWork();
> >>          }
> >>        doUnlockedWork();
> >>        counter++;
> >>      }
> >>  }
> >>  public void run()
> >>  {
> >>    while (!Main.done)
> >>      {
> >>        mylock.lock();
> >>        try {
> >>          doLockedWork();
> >>        }
> >>        finally {
> >>          mylock.unlock();
> >>        }
> >>        doUnlockedWork();
> >>        counter++;
> >>      }
> >>  }
> >>  void doLockedWork() {
> >>    doComputeWork(LOCKLOOPS);
> >>  }
> >>
> >>  void doUnlockedWork() {
> >>    doComputeWork(UNLOCKLOOPS);
> >>  }
> >>
> >>  void doComputeWork(int loopCount) {
> >>    int mysum = sum;
> >>    for (int i=0; i<loopCount; i++) {
> >>      mysum += compute6(mysum);
> >>    }
> >>    sum += mysum;   // write back to field
> >>  }
> >>
> >>
> >>
> >>
> >>
> >>  /**
> >>   * Marsaglia xorshift (1, 3, 10)
> >>   */
> >>  public static int compute6(int seed) {
> >>    seed ^= seed << 1;
> >>    seed ^= seed >>> 3;
> >>    seed ^= (seed << 10);
> >>    return seed;
> >>  }
> >>
> >>
> >> }
> >>
> >> public class Main
> >> {
> >>  static volatile boolean done = false;
> >>  static ReentrantLock mylock = new ReentrantLock();  // shared by all
> >> threads
> >>
> >>  public static Thread[] threads;
> >>
> >>  public static void main(String[] args)
> >>    throws Exception
> >>  {
> >>    int threadCount = Integer.parseInt(args[0]);
> >>
> >>    MyThread[] threads = new MyThread[threadCount];
> >>
> >>    for (int i = 0; i < threads.length; i++)
> >>      {
> >>        threads[i] = new MyThread();
> >>        threads[i].start();
> >>      }
> >>
> >>    Thread.sleep(10000);
> >>    done = true;
> >>
> >>    long total = 0;
> >>    for (int i = 0; i < threads.length; i++)
> >>      {
> >>        total += threads[i].counter;
> >> //      System.out.println(threads[i].counter);
> >>      }
> >>    System.out.println(threadCount + " threads, throughput = " + total);
> >>  }
> >> }
> >> _______________________________________________
> >> Concurrency-interest mailing list
> >> Concurrency-interest at cs.oswego.edu
> >> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> >>
> >
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest



More information about the Concurrency-interest mailing list