[concurrency-interest] Multi-core testing, help with findings

Boehm, Hans hans.boehm at hp.com
Mon Dec 11 19:13:51 EST 2006


Somehow that doesn't look like the whole explanation to me.  If I read
the code correctly, finished is only being touched once by another
thread for each major iteration.  Thus it should only leave the L1 cache
of the main thread once every 7 seconds.  It's unclear to me why the
main thread should be touching the memory system significantly at all.
It's also unclear to me why it should be scheduled all the time, instead
of just being 1 of 1001 threads.

Depending on the platform, might the thread creation cost just be a lot
higher?  Or might you get several instances of the counter variable in
the same cache line?  Neither of those sounds all that likely, either
...

Hans

> -----Original Message-----
> From: concurrency-interest-bounces at cs.oswego.edu 
> [mailto:concurrency-interest-bounces at cs.oswego.edu] On Behalf 
> Of David Holmes
> Sent: Monday, December 11, 2006 2:24 PM
> To: David Harrigan; concurrency-interest at cs.oswego.edu
> Subject: Re: [concurrency-interest] Multi-core testing, help 
> with findings
> 
> David,
> 
> You have a busy wait-loop which will try to consume 
> 1-CPU/CORE and continually bang on the "finished" variable, 
> doing nothing but interfere with the execution of the real 
> work due to memory/cache traffic. On a single processor 
> system your busy thread will get switched out after each 
> timeslice and get far less CPU time to interfere.
> 
> So I think what you are seeing here is a scheduling artifact 
> of the OS.
> 
> Cheers,
> David Holmes
> 
> > -----Original Message-----
> > From: concurrency-interest-bounces at cs.oswego.edu
> > [mailto:concurrency-interest-bounces at cs.oswego.edu]On 
> Behalf Of David 
> > Harrigan
> > Sent: Monday, 11 December 2006 10:32 PM
> > To: concurrency-interest at cs.oswego.edu
> > Subject: Re: [concurrency-interest] Multi-core testing, help with 
> > findings
> >
> >
> >
> > Hi,
> >
> > Oops, that should be after 20 runs on the Pentium-M...not 5!!
> >
> > Also, I'm using JDK 6 final - the one that was released today.
> >
> > -=david=-
> >
> >
> > David Harrigan wrote:
> > >
> > > Hi All,
> > >
> > > I've recently acquired a nice new shiny core 2 duo (2 x 2.0Ghz)
> > laptop and
> > > I thought I would try out a test of threading in it. So, I
> > wrote a simple
> > > class (see below). However, my findings are curious and I 
> would like 
> > > if possible someone to explain why they are slower on my 
> multi-core 
> > > system than my older system which was a Pentium-M @ 2.33Ghz. Both
> > machines, apart
> > > from the processor are near enough identical - same disk speed,
> > same type
> > > of memory (667Mhz DDR2 2GB) etc..
> > >
> > > After 20 runs of my program on the core 2 duo, the 
> average time was :
> > > 6975ms
> > > After 5 runs of my program on the Pentium-M, the average time
> > was : 2735m
> > >
> > > I suspect it's because with two processors they are both 
> contending 
> > > for main memory. Notice that I have the counter as volatile which 
> > > forces the variable to flush out to memory each time - 
> since this is 
> > > what I'm interested in testing - real world stuff where 
> things are 
> > > synch'ed (when it wasn't volatile, the change was 
> dramatic - because 
> > > the core 2 duo has 4MB of cache it was extremely fast, 
> whereas the 
> > > Pentium-M with
> > only 1MB of
> > > cache was a lot lot slower)...
> > >
> > >
> > >
> > >
> > > import java.util.concurrent.BrokenBarrierException;
> > > import java.util.concurrent.CyclicBarrier;
> > >
> > > public class ThreadTest {
> > >
> > >     private static final int howMany = 1000;
> > >     private static volatile boolean finished;
> > >     final CyclicBarrier barrier = new CyclicBarrier(howMany, new
> > > Runnable() {
> > >         public void run() {
> > >             finished = true;
> > >         }
> > >     });
> > >
> > >     public static void main(String[] args) {
> > >         ThreadTest t = new ThreadTest();
> > >         long total = 0;
> > >         for(int i = 0 ; i < 20 ; i ++) {
> > >             long elapsedTime = t.doIt();
> > >             total += elapsedTime;
> > >             System.out.println("Run #" + i + " : elapsed 
> time = " + 
> > > elapsedTime + "ms");
> > >         }
> > >         System.out.println("Average time = " + (total / 
> 20) + "ms");
> > >     }
> > >
> > >     private long doIt() {
> > >         long startTime = System.currentTimeMillis();
> > >         for(int i = 0; i < howMany; i++) {
> > >             new Thread(new Worker()).start();
> > >         }
> > >         while(!finished);
> > >         long endTime = System.currentTimeMillis();
> > >         return (endTime - startTime);
> > >
> > >     }
> > >
> > >     class Worker implements Runnable {
> > >         volatile int counter;
> > >         public void run() {
> > >             for(counter = 0 ; counter < 1000000 ; counter++);
> > >             try {
> > >                 barrier.await();
> > >             } catch(InterruptedException e) {
> > >                 return;
> > >             } catch(BrokenBarrierException e) {
> > >                 return;
> > >             }
> > >         }
> > >     }
> > > }
> > >
> > >
> > > -=david=-
> > >
> >
> > --
> > View this message in context:
> > http://www.nabble.com/Multi-core-testing%2C-help-with-findings-tf2
> 793302.html#a7793847
> Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
> 
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at altair.cs.oswego.edu
> http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
> 
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at altair.cs.oswego.edu
> http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
> 



More information about the Concurrency-interest mailing list