[concurrency-interest] Some interesting (confusing?) benchmark results

David Holmes davidcholmes at aapt.net.au
Mon Feb 11 17:18:12 EST 2013


That doesn't make sense to me. If you had a threshold of 250 then you are
creating thousands of tasks and that overhead should potentially bring your
laptop to its knees. With 250K threshold you have far fewer tasks.

That said I don't know what the memory requirements are for your algorithm.
If you are allocating temporary arrays then a  large threshold could well
cause a problem.

The parallel array sort going into JDK 8 is a merge sort but uses a working
array equal in size to the original array to be sorted.

David

> -----Original Message-----
> From: concurrency-interest-bounces at cs.oswego.edu
> [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of
> thurston at nomagicsoftware.com
> Sent: Tuesday, 12 February 2013 7:49 AM
> To: concurrency-interest at cs.oswego.edu
> Subject: Re: [concurrency-interest] Some interesting (confusing?)
> benchmark results
>
>
> Well, it's the algorithm that's dispositive in this case.
> Insertion-sort is empirically better than mergesort up to around
> 25-sized arrays.  That's where I started, and I played around with
> different thresholds and 250 seemed to be the best.
> Anything larger than 1000 you start to see significant degradation.
> I tried your formula (approximately) with N = 2 000 000 ==> 250K
>
> That brought my laptop to its knees (as expected): over 33K ms as
> opposed to 390ms with 250.
>
> Following my own admonition (the algorithm matters), it occurs to me
> that I just use my own serial mergesort as comparison to the Forkjoin
> (as opposed to Arrays.sort).  I'll let y'all know the results of that
>
> -T
>
>
>
> On 2013-02-11 13:19, David Holmes wrote:
> > Your sequential threshold is far too small for only two worker
> > threads. You
> > would get near optimum speedup only splitting the problem into two.
> >
> > The heuristic for sequential threshold is:
> >
> > Threshold = N / (LoadBalanceFactor * NumCore)
> >
> > where N is the "size" of the problem.
> >
> > The LoadBalanceFactor in an ideal world would be 1 but real tasks
> > encounter
> > delays and execute differently so a value > 1 is needed.
> > Heuristically a
> > value of 8 is a good starting value.
> >
> > David Holmes
> >
> >> -----Original Message-----
> >> From: concurrency-interest-bounces at cs.oswego.edu
> >> [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of
> >> thurston at nomagicsoftware.com
> >> Sent: Tuesday, 12 February 2013 6:52 AM
> >> To: concurrency-interest at cs.oswego.edu
> >> Subject: [concurrency-interest] Some interesting (confusing?)
> >> benchmark
> >> results
> >>
> >>
> >> Hello,
> >>
> >> I made some initial attempts at using ForkJoin framework for some
> >> rather obvious recursively parallel use-cases, namely summing the
> >> elements of an int[] and also sorting an int[] (randomly filled).
> >>
> >> The problem-sizes I used ranged from 1M - 10M.
> >>
> >> I really enjoy using the framework and find it relatively easy to
> >> reason about my programs (if not necessarily the internals of the
> >> framework).
> >> But I was disappointed with the results: in both cases they were
> >> slower
> >> than the corresponding Java serial implementation.
> >>
> >> I completely understand the principle of YMMV, and I wasn't
> >> expecting
> >> 2x speedup, but especially in the case of the sort, but I was
> >> expecting
> >> that ForkJoin would do at least a little better than the
> >> single-threaded
> >> version.  I guess what I'm asking is: are these results surprising?
> >> Does it mean I'm doing something wrong?
> >>
> >> I'll give a brief description of my implementation:
> >> single-threaded:  Arrays.sort(int[])
> >>
> >> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived at
> >> by
> >> trial-and-error)
> >>
> >> int[] compute()
> >> {
> >> if int[].length < THRESHOLD
> >>      return insertionSort(int[])
> >> left = new MergeSort(int[] //split in half)
> >> right = new MergeSort(int[] //other half)
> >> return merge(right.compute(), left.join())
> >> }
> >>
> >> The insertionSort() and merge() methods are just standard
> >> implementations; there is a bit of apples to oranges comparison
> >> since
> >> Arrays.sort() uses an optimized quicksort, but we're still talking
> >> O(nlog(n))
> >>
> >> Just ran it on my laptop:
> >> Windows 7 64-bit
> >> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
> >> Core 2 Duo==> 2 cores 2GHz
> >>
> >> 2M int[]:
> >> single-threaded: ~330ms
> >> ForkJoin (2 workers): ~390ms
> >>
> >> Would appreciate any feedback and hopefully the #s are somewhat
> >> helpful
> >> (and please no scoffawing at my antiquated machine)
> >>
> >> -T
> >>
> >>
> >> _______________________________________________
> >> Concurrency-interest mailing list
> >> Concurrency-interest at cs.oswego.edu
> >> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> >>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>



More information about the Concurrency-interest mailing list