[concurrency-interest] Some interesting (confusing?) benchmark results

Stanimir Simeonoff stanimir at riflexo.com
Mon Feb 11 17:35:44 EST 2013


Since the L2 cache size of the CPU is relatively low (I suppose 2M on
T6600), you'd like to keep the small portions within the cache. Perhaps
in-place sorting like smoothsort can help. Merge should be in-place as well.


On Mon, Feb 11, 2013 at 11:49 PM, <thurston at nomagicsoftware.com> wrote:

> Well, it's the algorithm that's dispositive in this case.
> Insertion-sort is empirically better than mergesort up to around 25-sized
> arrays.  That's where I started, and I played around with different
> thresholds and 250 seemed to be the best.
> Anything larger than 1000 you start to see significant degradation.
> I tried your formula (approximately) with N = 2 000 000 ==> 250K
>
> That brought my laptop to its knees (as expected): over 33K ms as opposed
> to 390ms with 250.
>
> Following my own admonition (the algorithm matters), it occurs to me that
> I just use my own serial mergesort as comparison to the Forkjoin (as
> opposed to Arrays.sort).  I'll let y'all know the results of that
>
> -T
>
>
>
>
> On 2013-02-11 13:19, David Holmes wrote:
>
>> Your sequential threshold is far too small for only two worker threads.
>> You
>> would get near optimum speedup only splitting the problem into two.
>>
>> The heuristic for sequential threshold is:
>>
>> Threshold = N / (LoadBalanceFactor * NumCore)
>>
>> where N is the "size" of the problem.
>>
>> The LoadBalanceFactor in an ideal world would be 1 but real tasks
>> encounter
>> delays and execute differently so a value > 1 is needed. Heuristically a
>> value of 8 is a good starting value.
>>
>> David Holmes
>>
>>  -----Original Message-----
>>> From: concurrency-interest-bounces@**cs.oswego.edu<concurrency-interest-bounces at cs.oswego.edu>
>>> [mailto:concurrency-interest-**bounces at cs.oswego.edu<concurrency-interest-bounces at cs.oswego.edu>]On
>>> Behalf Of
>>> thurston at nomagicsoftware.com
>>> Sent: Tuesday, 12 February 2013 6:52 AM
>>> To: concurrency-interest at cs.**oswego.edu<concurrency-interest at cs.oswego.edu>
>>> Subject: [concurrency-interest] Some interesting (confusing?) benchmark
>>> results
>>>
>>>
>>> Hello,
>>>
>>> I made some initial attempts at using ForkJoin framework for some
>>> rather obvious recursively parallel use-cases, namely summing the
>>> elements of an int[] and also sorting an int[] (randomly filled).
>>>
>>> The problem-sizes I used ranged from 1M - 10M.
>>>
>>> I really enjoy using the framework and find it relatively easy to
>>> reason about my programs (if not necessarily the internals of the
>>> framework).
>>> But I was disappointed with the results: in both cases they were slower
>>> than the corresponding Java serial implementation.
>>>
>>> I completely understand the principle of YMMV, and I wasn't expecting
>>> 2x speedup, but especially in the case of the sort, but I was expecting
>>> that ForkJoin would do at least a little better than the single-threaded
>>> version.  I guess what I'm asking is: are these results surprising?
>>> Does it mean I'm doing something wrong?
>>>
>>> I'll give a brief description of my implementation:
>>> single-threaded:  Arrays.sort(int[])
>>>
>>> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived at by
>>> trial-and-error)
>>>
>>> int[] compute()
>>> {
>>> if int[].length < THRESHOLD
>>>      return insertionSort(int[])
>>> left = new MergeSort(int[] //split in half)
>>> right = new MergeSort(int[] //other half)
>>> return merge(right.compute(), left.join())
>>> }
>>>
>>> The insertionSort() and merge() methods are just standard
>>> implementations; there is a bit of apples to oranges comparison since
>>> Arrays.sort() uses an optimized quicksort, but we're still talking
>>> O(nlog(n))
>>>
>>> Just ran it on my laptop:
>>> Windows 7 64-bit
>>> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
>>> Core 2 Duo==> 2 cores 2GHz
>>>
>>> 2M int[]:
>>> single-threaded: ~330ms
>>> ForkJoin (2 workers): ~390ms
>>>
>>> Would appreciate any feedback and hopefully the #s are somewhat helpful
>>> (and please no scoffawing at my antiquated machine)
>>>
>>> -T
>>>
>>>
>>> ______________________________**_________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.**oswego.edu<Concurrency-interest at cs.oswego.edu>
>>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>>>
>>>
> ______________________________**_________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.**oswego.edu <Concurrency-interest at cs.oswego.edu>
> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130212/897564a7/attachment.html>


More information about the Concurrency-interest mailing list