[concurrency-interest] Low cpu usage with fair ReentrantLock
David Holmes
davidcholmes at aapt.net.au
Mon May 10 18:42:54 EDT 2010
Andrew Haley writes on Tuesday, 11 May 2010 2:36 AM:
> Perhaps so. It's hard to tell from that javadoc exactly what is
> meant: I thought, from reading that, that "slower; often much slower"
> referred to slow operation with high CPU loads when locking rather
> than, as we have here, CPUs being starved of work. But in any case,
> given the fact that locking is so rare, I found this result a little
> bit surprising.
Just one further comment. Locking may be "rare" from the perspective of
locked-work:unlocked-work, but it would seem that your threads are executing
in-phase, and so contention is actually high.
David Holmes
> I'm asking here because I want to know (from the horse's mouth, as it
> were) whether this is exactly what people would expect of such a lock.
> And if it is, perhaps I can add a litle to that javadoc.
>
> Andrew.
>
>
> > On Mon, May 10, 2010 at 10:30 AM, Andrew Haley <aph at redhat.com> wrote:
> >
> >> Consider a set of identical threads that occasionally need to acquire
> >> a lock. Contention for this lock is low, because the ratio of
> >> unlocked to locked work is high, 10000:1.
> >>
> >> public void run()
> >> {
> >> while (!Main.done)
> >> {
> >> mylock.lock();
> >> try {
> >> doLockedWork();
> >> }
> >> finally {
> >> mylock.unlock();
> >> }
> >> doUnlockedWork();
> >> counter++;
> >> }
> >>
> >> I've observed that when a fair ReentrantLock is used, the CPU usage
> >> across all cores becomes very low, and much of the time cores are
> >> idling. This seems to be particularly true on a multi-CPU machine,
> >> whereas single-CPU machines aren't so bad.
> >>
> >> What seems to happen is that threads waiting to acquire the lock "pile
> >> up" and are descheduled, and the number of threads doing work falls
> >> dramatically.
> >>
> >> I added a logging facility inside park and unpark which tracks when
> >> threads are sleeping waiting for pthread_cond_wait(), when they are
> >> unparked by calling pthread_cond_signal(), and when they actually
> >> return from pthread_cond_wait(). On an 8 core machine with 32
> >> threads, the process eventually stabilizes with, on average, only a
> >> few threads running at any given time.
> >>
> >> This is a typical sample of the behaviour of a fair ReentrantLock
> >> after the program has been running for a while. "r" is the number of
> >> threads running, "s" the number of threads sleeping, and "a" the
> >> number of threads that have been unparked but have not yet returned
> >> from pthread_cond_wait():
> >>
> >> pid 29978: 2 r:5, s:28, a:0
> >> pid 29986: 1 r:5, s:27, a:1
> >> pid 29986: 2 r:6, s:27, a:0
> >> pid 29987: 0 r:5, s:28, a:0
> >> pid 29993: 1 r:5, s:27, a:1
> >> pid 29995: 0 r:4, s:28, a:1
> >> pid 29993: 2 r:5, s:28, a:0
> >> pid 29974: 1 r:5, s:27, a:1
> >> pid 29981: 0 r:4, s:28, a:1
> >> pid 29978: 0 r:3, s:29, a:1
> >> pid 29986: 0 r:2, s:30, a:1
> >> pid 29993: 0 r:1, s:31, a:1
> >> pid 29974: 2 r:2, s:31, a:0
> >> pid 29989: 1 r:2, s:30, a:1
> >> pid 29974: 0 r:1, s:31, a:1
> >> pid 29989: 2 r:2, s:31, a:0
> >> pid 29996: 1 r:2, s:30, a:1
> >> pid 29996: 2 r:3, s:30, a:0
> >> pid 29984: 1 r:3, s:29, a:1
> >>
> >> As you can see, the number of threads running at any given time is
> >> low, and the machine is on average only about 50% loaded. Most
> >> threads are sleeping.
> >>
> >> With an unfair ReentrantLock, there are no such problems, and the
> >> number of running threads is greater:
> >>
> >> pid 32169: 2 r:6, s:27, a:0
> >> pid 32156: 1 r:6, s:26, a:1
> >> pid 32156: 2 r:7, s:26, a:0
> >> pid 32180: 1 r:7, s:25, a:1
> >> pid 32180: 2 r:8, s:25, a:0
> >> pid 32164: 1 r:8, s:24, a:1
> >> pid 32180: 0 r:7, s:25, a:1
> >> pid 32164: 2 r:8, s:25, a:0
> >> pid 32159: 1 r:8, s:24, a:1
> >> pid 32159: 2 r:9, s:24, a:0
> >> pid 32150: 1 r:9, s:23, a:1
> >> pid 32169: 0 r:8, s:24, a:1
> >> pid 32172: 0 r:7, s:25, a:1
> >> pid 32150: 2 r:8, s:25, a:0
> >> pid 32155: 1 r:8, s:24, a:1
> >>
> >> This machine has 8 cores, so this load is sufficient to use almost
> >> 100% CPU.
> >>
> >> Does anyone have any idea why a fair ReentrantLock should result in
> >> such low CPU usage? This pattern of usage has, apparently, been
> >> observed in more realistic workloads, from which this test case has
> >> been extracted.
> >>
> >> To run the example, just run "java Main N" where N is the number of
> >> threads.
> >>
> >> I don't think this is the same problem as
> >> http://blogs.sun.com/dave/entry/a_scheduling_and_dispatching_oddity
> >> because tweaking the kernel policies doesn't help. Also, the kernel
> >> seems from my logs to be waking a thread quickly once it is sent a
> >> pthread_cond_signal(): there is never more than one thread awakened
> >> but not running.
> >>
> >> Andrew.
> >>
> >>
> >>
> >> import java.util.concurrent.locks.*;
> >>
> >> class MyThread extends Thread
> >> {
> >> static int UNLOCKLOOPS = 10000;
> >> static int LOCKLOOPS = 1;
> >>
> >> public long counter;
> >> int sum;
> >>
> >> static public ReentrantLock mylock = new ReentrantLock(true);
> // shared
> >> by all threads
> >>
> >> public void runX()
> >> {
> >> while (!Main.done)
> >> {
> >> synchronized(mylock)
> >> {
> >> doLockedWork();
> >> }
> >> doUnlockedWork();
> >> counter++;
> >> }
> >> }
> >> public void run()
> >> {
> >> while (!Main.done)
> >> {
> >> mylock.lock();
> >> try {
> >> doLockedWork();
> >> }
> >> finally {
> >> mylock.unlock();
> >> }
> >> doUnlockedWork();
> >> counter++;
> >> }
> >> }
> >> void doLockedWork() {
> >> doComputeWork(LOCKLOOPS);
> >> }
> >>
> >> void doUnlockedWork() {
> >> doComputeWork(UNLOCKLOOPS);
> >> }
> >>
> >> void doComputeWork(int loopCount) {
> >> int mysum = sum;
> >> for (int i=0; i<loopCount; i++) {
> >> mysum += compute6(mysum);
> >> }
> >> sum += mysum; // write back to field
> >> }
> >>
> >>
> >>
> >>
> >>
> >> /**
> >> * Marsaglia xorshift (1, 3, 10)
> >> */
> >> public static int compute6(int seed) {
> >> seed ^= seed << 1;
> >> seed ^= seed >>> 3;
> >> seed ^= (seed << 10);
> >> return seed;
> >> }
> >>
> >>
> >> }
> >>
> >> public class Main
> >> {
> >> static volatile boolean done = false;
> >> static ReentrantLock mylock = new ReentrantLock(); // shared by all
> >> threads
> >>
> >> public static Thread[] threads;
> >>
> >> public static void main(String[] args)
> >> throws Exception
> >> {
> >> int threadCount = Integer.parseInt(args[0]);
> >>
> >> MyThread[] threads = new MyThread[threadCount];
> >>
> >> for (int i = 0; i < threads.length; i++)
> >> {
> >> threads[i] = new MyThread();
> >> threads[i].start();
> >> }
> >>
> >> Thread.sleep(10000);
> >> done = true;
> >>
> >> long total = 0;
> >> for (int i = 0; i < threads.length; i++)
> >> {
> >> total += threads[i].counter;
> >> // System.out.println(threads[i].counter);
> >> }
> >> System.out.println(threadCount + " threads, throughput = " + total);
> >> }
> >> }
> >> _______________________________________________
> >> Concurrency-interest mailing list
> >> Concurrency-interest at cs.oswego.edu
> >> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> >>
> >
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
More information about the Concurrency-interest
mailing list