[concurrency-interest] Some interesting (confusing?) benchmark results

√iktor Ҡlang viktor.klang at gmail.com
Mon Feb 11 17:40:27 EST 2013


What's the state of the art on in-place mergesorts?
(It's been a while since I was researching that)

Cheers,
√


On Mon, Feb 11, 2013 at 11:35 PM, Stanimir Simeonoff
<stanimir at riflexo.com>wrote:

> Since the L2 cache size of the CPU is relatively low (I suppose 2M on
> T6600), you'd like to keep the small portions within the cache. Perhaps
> in-place sorting like smoothsort can help. Merge should be in-place as well.
>
>
>
> On Mon, Feb 11, 2013 at 11:49 PM, <thurston at nomagicsoftware.com> wrote:
>
>> Well, it's the algorithm that's dispositive in this case.
>> Insertion-sort is empirically better than mergesort up to around 25-sized
>> arrays.  That's where I started, and I played around with different
>> thresholds and 250 seemed to be the best.
>> Anything larger than 1000 you start to see significant degradation.
>> I tried your formula (approximately) with N = 2 000 000 ==> 250K
>>
>> That brought my laptop to its knees (as expected): over 33K ms as opposed
>> to 390ms with 250.
>>
>> Following my own admonition (the algorithm matters), it occurs to me that
>> I just use my own serial mergesort as comparison to the Forkjoin (as
>> opposed to Arrays.sort).  I'll let y'all know the results of that
>>
>> -T
>>
>>
>>
>>
>> On 2013-02-11 13:19, David Holmes wrote:
>>
>>> Your sequential threshold is far too small for only two worker threads.
>>> You
>>> would get near optimum speedup only splitting the problem into two.
>>>
>>> The heuristic for sequential threshold is:
>>>
>>> Threshold = N / (LoadBalanceFactor * NumCore)
>>>
>>> where N is the "size" of the problem.
>>>
>>> The LoadBalanceFactor in an ideal world would be 1 but real tasks
>>> encounter
>>> delays and execute differently so a value > 1 is needed. Heuristically a
>>> value of 8 is a good starting value.
>>>
>>> David Holmes
>>>
>>>  -----Original Message-----
>>>> From: concurrency-interest-bounces@**cs.oswego.edu<concurrency-interest-bounces at cs.oswego.edu>
>>>> [mailto:concurrency-interest-**bounces at cs.oswego.edu<concurrency-interest-bounces at cs.oswego.edu>]On
>>>> Behalf Of
>>>> thurston at nomagicsoftware.com
>>>> Sent: Tuesday, 12 February 2013 6:52 AM
>>>> To: concurrency-interest at cs.**oswego.edu<concurrency-interest at cs.oswego.edu>
>>>> Subject: [concurrency-interest] Some interesting (confusing?) benchmark
>>>> results
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I made some initial attempts at using ForkJoin framework for some
>>>> rather obvious recursively parallel use-cases, namely summing the
>>>> elements of an int[] and also sorting an int[] (randomly filled).
>>>>
>>>> The problem-sizes I used ranged from 1M - 10M.
>>>>
>>>> I really enjoy using the framework and find it relatively easy to
>>>> reason about my programs (if not necessarily the internals of the
>>>> framework).
>>>> But I was disappointed with the results: in both cases they were slower
>>>> than the corresponding Java serial implementation.
>>>>
>>>> I completely understand the principle of YMMV, and I wasn't expecting
>>>> 2x speedup, but especially in the case of the sort, but I was expecting
>>>> that ForkJoin would do at least a little better than the single-threaded
>>>> version.  I guess what I'm asking is: are these results surprising?
>>>> Does it mean I'm doing something wrong?
>>>>
>>>> I'll give a brief description of my implementation:
>>>> single-threaded:  Arrays.sort(int[])
>>>>
>>>> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived at by
>>>> trial-and-error)
>>>>
>>>> int[] compute()
>>>> {
>>>> if int[].length < THRESHOLD
>>>>      return insertionSort(int[])
>>>> left = new MergeSort(int[] //split in half)
>>>> right = new MergeSort(int[] //other half)
>>>> return merge(right.compute(), left.join())
>>>> }
>>>>
>>>> The insertionSort() and merge() methods are just standard
>>>> implementations; there is a bit of apples to oranges comparison since
>>>> Arrays.sort() uses an optimized quicksort, but we're still talking
>>>> O(nlog(n))
>>>>
>>>> Just ran it on my laptop:
>>>> Windows 7 64-bit
>>>> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
>>>> Core 2 Duo==> 2 cores 2GHz
>>>>
>>>> 2M int[]:
>>>> single-threaded: ~330ms
>>>> ForkJoin (2 workers): ~390ms
>>>>
>>>> Would appreciate any feedback and hopefully the #s are somewhat helpful
>>>> (and please no scoffawing at my antiquated machine)
>>>>
>>>> -T
>>>>
>>>>
>>>> ______________________________**_________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest at cs.**oswego.edu<Concurrency-interest at cs.oswego.edu>
>>>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>>
>>>>
>> ______________________________**_________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.**oswego.edu <Concurrency-interest at cs.oswego.edu>
>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>


-- 
*Viktor Klang*
*Director of Engineering*
*
*
Typesafe <http://www.typesafe.com/> - The software stack for applications
that scale
Twitter: @viktorklang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130211/73b3463a/attachment.html>


More information about the Concurrency-interest mailing list