[concurrency-interest] Some interesting (confusing?) benchmark results

David Holmes davidcholmes at aapt.net.au
Mon Feb 11 17:34:30 EST 2013


Victor,

See the description of LoadBalanceFactor. It makes sense to create more
tasks than there are threads because things progress at different rates. A
thread can be delayed through scheduling, JIT activity, GC etc etc. Suppose
with 2 threads one had completed its half while the other was only a quarter
of the way through. You then have to wait much longer for completion. But if
you have broken things down into smaller subtasks (say 4) then that first
thread could be "helping" the second to complete the overall job. But break
it down too fine and you get too much task overhead. Hence the heuristic
value of 8 - which is generally a good starting point. Plus performance is
not super-sensitive to getting the LoadBalanceFactor exactly right.

David
  -----Original Message-----
  From: viktor ?lang [mailto:viktor.klang at gmail.com]
  Sent: Tuesday, 12 February 2013 8:25 AM
  To: dholmes at ieee.org
  Cc: thurston; concurrency-interest
  Subject: Re: [concurrency-interest] Some interesting (confusing?)
benchmark results







  On Mon, Feb 11, 2013 at 11:18 PM, David Holmes <davidcholmes at aapt.net.au>
wrote:

    That doesn't make sense to me. If you had a threshold of 250 then you
are
    creating thousands of tasks and that overhead should potentially bring
your
    laptop to its knees. With 250K threshold you have far fewer tasks.



  I'm not sure it makes sense to create more sort-tasks than there is
Threads in the pool.


    That said I don't know what the memory requirements are for your
algorithm.
    If you are allocating temporary arrays then a  large threshold could
well
    cause a problem.

    The parallel array sort going into JDK 8 is a merge sort but uses a
working
    array equal in size to the original array to be sorted.



  No in-place merge sort? ;-)


    David


    > -----Original Message-----
    > From: concurrency-interest-bounces at cs.oswego.edu
    > [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of
    > thurston at nomagicsoftware.com

    > Sent: Tuesday, 12 February 2013 7:49 AM
    > To: concurrency-interest at cs.oswego.edu
    > Subject: Re: [concurrency-interest] Some interesting (confusing?)
    > benchmark results
    >
    >
    > Well, it's the algorithm that's dispositive in this case.
    > Insertion-sort is empirically better than mergesort up to around
    > 25-sized arrays.  That's where I started, and I played around with
    > different thresholds and 250 seemed to be the best.
    > Anything larger than 1000 you start to see significant degradation.
    > I tried your formula (approximately) with N = 2 000 000 ==> 250K
    >
    > That brought my laptop to its knees (as expected): over 33K ms as
    > opposed to 390ms with 250.
    >
    > Following my own admonition (the algorithm matters), it occurs to me
    > that I just use my own serial mergesort as comparison to the Forkjoin
    > (as opposed to Arrays.sort).  I'll let y'all know the results of that
    >
    > -T
    >
    >
    >
    > On 2013-02-11 13:19, David Holmes wrote:
    > > Your sequential threshold is far too small for only two worker
    > > threads. You
    > > would get near optimum speedup only splitting the problem into two.
    > >
    > > The heuristic for sequential threshold is:
    > >
    > > Threshold = N / (LoadBalanceFactor * NumCore)
    > >
    > > where N is the "size" of the problem.
    > >
    > > The LoadBalanceFactor in an ideal world would be 1 but real tasks
    > > encounter
    > > delays and execute differently so a value > 1 is needed.
    > > Heuristically a
    > > value of 8 is a good starting value.
    > >
    > > David Holmes
    > >
    > >> -----Original Message-----
    > >> From: concurrency-interest-bounces at cs.oswego.edu
    > >> [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of
    > >> thurston at nomagicsoftware.com
    > >> Sent: Tuesday, 12 February 2013 6:52 AM
    > >> To: concurrency-interest at cs.oswego.edu
    > >> Subject: [concurrency-interest] Some interesting (confusing?)
    > >> benchmark
    > >> results
    > >>
    > >>
    > >> Hello,
    > >>
    > >> I made some initial attempts at using ForkJoin framework for some
    > >> rather obvious recursively parallel use-cases, namely summing the
    > >> elements of an int[] and also sorting an int[] (randomly filled).
    > >>
    > >> The problem-sizes I used ranged from 1M - 10M.
    > >>
    > >> I really enjoy using the framework and find it relatively easy to
    > >> reason about my programs (if not necessarily the internals of the
    > >> framework).
    > >> But I was disappointed with the results: in both cases they were
    > >> slower
    > >> than the corresponding Java serial implementation.
    > >>
    > >> I completely understand the principle of YMMV, and I wasn't
    > >> expecting
    > >> 2x speedup, but especially in the case of the sort, but I was
    > >> expecting
    > >> that ForkJoin would do at least a little better than the
    > >> single-threaded
    > >> version.  I guess what I'm asking is: are these results surprising?
    > >> Does it mean I'm doing something wrong?
    > >>
    > >> I'll give a brief description of my implementation:
    > >> single-threaded:  Arrays.sort(int[])
    > >>
    > >> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived at
    > >> by
    > >> trial-and-error)
    > >>
    > >> int[] compute()
    > >> {
    > >> if int[].length < THRESHOLD
    > >>      return insertionSort(int[])
    > >> left = new MergeSort(int[] //split in half)
    > >> right = new MergeSort(int[] //other half)
    > >> return merge(right.compute(), left.join())
    > >> }
    > >>
    > >> The insertionSort() and merge() methods are just standard
    > >> implementations; there is a bit of apples to oranges comparison
    > >> since
    > >> Arrays.sort() uses an optimized quicksort, but we're still talking
    > >> O(nlog(n))
    > >>
    > >> Just ran it on my laptop:
    > >> Windows 7 64-bit
    > >> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
    > >> Core 2 Duo==> 2 cores 2GHz
    > >>
    > >> 2M int[]:
    > >> single-threaded: ~330ms
    > >> ForkJoin (2 workers): ~390ms
    > >>
    > >> Would appreciate any feedback and hopefully the #s are somewhat
    > >> helpful
    > >> (and please no scoffawing at my antiquated machine)
    > >>
    > >> -T
    > >>
    > >>
    > >> _______________________________________________
    > >> Concurrency-interest mailing list
    > >> Concurrency-interest at cs.oswego.edu
    > >> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
    > >>
    >
    > _______________________________________________
    > Concurrency-interest mailing list
    > Concurrency-interest at cs.oswego.edu
    > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
    >

    _______________________________________________
    Concurrency-interest mailing list
    Concurrency-interest at cs.oswego.edu
    http://cs.oswego.edu/mailman/listinfo/concurrency-interest






  --

  Viktor Klang
  Director of Engineering


  Typesafe - The software stack for applications that scale
  Twitter: @viktorklang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130212/86c1da85/attachment-0001.html>


More information about the Concurrency-interest mailing list