[concurrency-interest] Multi-core testing, help with findings

David Holmes dcholmes at optusnet.com.au
Tue Dec 12 23:01:40 EST 2006

I really must learn to read what is written in sample code rather than what
I expect to see :) The volatile on the per-thread counter had escaped my
notice. I most definitely agree that that is the cause of the performance
loss - volatiles are free on UP systems but not on MP and this is
pathological usage.

David Holmes
  -----Original Message-----
  From: concurrency-interest-bounces at cs.oswego.edu
[mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Boehm, Hans
  Sent: Wednesday, 13 December 2006 5:38 AM
  To: David Harrigan
  Cc: concurrency-interest at cs.oswego.edu
  Subject: Re: [concurrency-interest] Multi-core testing, help with findings

  I believe that Bjorn is right, and it's the fence following volatile
stores on a multiprocessor that's causing the problem.  That sounds far more
plausible than anything else I've seen here, including my own explanations.

  Note that volatile doesn't force anything out of the cache; it just forces
the processor to execute an mfence instructions for each store to enforce
ordering between a volatile store and a subsequent volatile load.  On a P4
that typically costs you > 100 cycles.  On a core 2 duo I believe it's much
less, but still significant.

  (Since the volatiles are only accessed by a single thread, I also believe
it's actually correct to effectively optimize out the volatile qualifier in
this case, or to optimize away the whole loop for that matter.   I'd be
mildly impressed if a compiler actually did that.  As a general rule, it's
poor practice to put empty loops in microbenchmarks.  It makes the benchmark
very dependent on aspects of compiler optimization that don't matter for
real code.)


    From: concurrency-interest-bounces at cs.oswego.edu
[mailto:concurrency-interest-bounces at cs.oswego.edu] On Behalf Of David
    Sent: Tuesday, December 12, 2006 1:52 AM
    To: concurrency-interest at cs.oswego.edu
    Subject: Re: [concurrency-interest] Multi-core testing, help with

    Hi All,

    I would love to explore this further. I could certainly try out the
thread.yield().....but I have
    a small problemo now! My core 2 duo is going back to the factory since
the screen doesn't
    appear to want to play ball :-( I'll have to wait until I can try the
suggestions out. However,
    this of course does not mean no-one else can give it a whirl. All this
is very interesting, and
    I think highlights an area that is going to become more and more
prevalent - as more
    developers have multi-core machines to develop on, then these things are
going to come
    up more often...

    From what I can read in this thread so far - it's either a scheduling
issue with the OS, or
    I'm being too aggressive with use of the volatile (I chose this since I
wanted to see what
    the processors would act like when forced to go to main memory, rather
than fetching
    from their 4MB cache.).

    Oh, and it's Linux kernel 2.6.17.


    On 12/12/06, Bjorn Antonsson <ban at bea.com > wrote:

      I would say that a lot of the extra time it takes comes from the fact
that the volatile stores/loads in the Worker class, actually 1000000 of
them, do mean something on a multi CPU.

      On a typical x86 SMP machine the load/store/load pattern on volatiles
results in an mfence instruction, which is quite costly. This is a normal
load/store/load without mfence on a single CPU machine, since we are
guaranteed that the next thread will have the same view of the memory.


      > -----Original Message-----
      > From: concurrency-interest-bounces at cs.oswego.edu
      > [mailto: concurrency-interest-bounces at cs.oswego.edu] On Behalf
      > Of David Harrigan
      > Sent: den 12 december 2006 07:57
      > To: concurrency-interest at cs.oswego.edu
      > Subject: Re: [concurrency-interest] Multi-core testing, help
      > with findings
      > Hi,
      > I completely forgot to mention that platform is Linux (Ubuntu 6.10).
      > Just scanning thru the mail, will read when I get to work...
      > -=david=-
      > On 12/12/06, Gregg Wonderly <gregg at cytetech.com> wrote:
      >       David Holmes wrote:
      >       > I've assumed the platform is Windows, but if it is
      > linux then that opens
      >       > other possibilities. The problem can be explained if
      > the busy-wait thread
      >       > doesn't get descheduled (which is easy to test by
      > changing it to not be a
      >       > busy-wait). The issue as to why it doesn't get
      > descheduled is then the
      >       > interesting part. I suspect an OS scheduling quirk on
      > multi-core, but need
      >       > more information.
      >       >>>>>    private long doIt() {
      >       >>>>>        long startTime = System.currentTimeMillis();
      >       >>>>>        for(int i = 0; i < howMany; i++) {
      >       >>>>>            new Thread(new Worker()).start();
      >       >>>>>        }
      >       >>>>>        while(!finished);
      >       >>>>>        long endTime = System.currentTimeMillis();
      >       >>>>>        return (endTime - startTime);
      >       >>>>>
      >       >>>>>    }
      >       Historically, I've found that busy waits like the above
      > are problematic.  I'd go
      >       along with David's comment/thought and try
      >               while(!finished) Thread.yield();
      >       or something else to cause it to get descheduled for a
      > whole quanta for each
      >       check rather than busy waiting for a whole quanta which
      > will keep at least one
      >       CPU busy doing nothing productive.
      >       Gregg Wonderly

      Notice:  This email message, together with any attachments, may
      information  of  BEA Systems,  Inc.,  its subsidiaries  and
      entities,  that may be confidential,  proprietary,  copyrighted
      legally privileged, and is intended solely for the use of the
      or entity named in this message. If you are not the intended
      and have received this message in error, please immediately return
      by email and then delete it.

      Concurrency-interest mailing list
      Concurrency-interest at altair.cs.oswego.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: /pipermail/attachments/20061213/69206aba/attachment-0001.html 

More information about the Concurrency-interest mailing list