[concurrency-interest] Some interesting (confusing?) benchmark results

thurston at nomagicsoftware.com thurston at nomagicsoftware.com
Wed Feb 13 10:27:21 EST 2013


Could you publish your machine specs?
And what was the size of your sample array?  2M or something else?
If it is 2M (and if I'm doing my math right), it's weird that my times 
are less:
basically for 2M/250, I get around 370ms +- 25ms for the MergeSorter
and around 330ms for java's Array.sort

Thanks

On 2013-02-13 06:40, Peter Levart wrote:
> On 02/12/2013 08:20 PM, thurston at nomagicsoftware.com wrote:
>> Here is all of the relevant code in a single blob.  I think it's 
>> selef-explanatory; sample test is at the end.
>>
>> http://pastebin.com/WrfBHYSG
>
> Hi Thurston,
>
> You're doing too much copying. I modified tour MergeSorter a little:
>
>
> public class MergeSorter2 extends RecursiveTask<int[]> {
>     static final int THRESHOLD = 250;
>     final int[] array, work;
>     final int offset, length;
>
>     public MergeSorter2(int[] array) {
>         this.array = array;
>         this.work = new int[array.length];
>         offset = 0;
>         length = array.length;
>     }
>
>     private MergeSorter2(int[] array, int[] work, int offset, int 
> length) {
>         this.array = array;
>         this.work = work;
>         this.offset = offset;
>         this.length = length;
>     }
>
>     @Override
>     protected int[] compute() {
>         if (length < MergeSorter2.THRESHOLD) {
>             Arrays.sort(array, offset, offset + length);
>             return this.array;
>         }
>
>         int halfLength = length >> 1;
>         MergeSorter2 left = new MergeSorter2(array, work, offset, 
> halfLength);
>         left.fork();
>
>         int mid = offset + halfLength;
>
>         MergeSorter2 right = new MergeSorter2(array, work, mid,
> length - halfLength);
>         right.compute();
>         left.join();
>
>         // copy 1st half from array to work
>         System.arraycopy(array, offset, work, offset, halfLength);
>
>         // merge 1st half of work and 2nd half of array into array
>         int end = offset + length;
>         int i = offset;
>         int j = mid;
>         int k = offset;
>         int x = work[i];
>         int y = array[j];
>         while (true) {
>             if (j >= end || x <= y) {
>                 array[k++] = x;
>                 i++;
>                 // until we drain the 1st half (the rest of array is
> already in place)
>                 if (i >= mid) break;
>                 x = work[i];
>             } else { // j < end
>                 array[k++] = y;
>                 j++;
>                 if (j < end) y = array[j];
>             }
>         }
>
>         return this.array;
>     }
> }
>
>
> ... it uses a single parallel "work" array for merging, which is
> shared among tasks. It tries to minimize the copying. Here's the
> results I get on a FJPool with parallelism=2:
>
>  MergeSorter: 1557638090 nanos
>  MergeSorter: 1545600414 nanos
>  MergeSorter: 1478185502 nanos
> MergeSorter2: 1058750062 nanos
> MergeSorter2:  978295370 nanos
> MergeSorter2:  976838429 nanos
>  Arrays.sort: 1494256525 nanos
>  Arrays.sort: 1505639161 nanos
>  Arrays.sort: 1483640342 nanos
>
>
> I get even better results with greater THRESHOLD (but not more than
> 10% better). Your MergeSorter is worse with greater THRESHOLD because
> you are using insert-sort for final tasks. The Arrays.sort() 
> delegates
> to insert-sort only when N < 47, else it uses quick-sort...
>
> Regards, Peter
>
>
>
>>
>>
>>
>> On 2013-02-12 01:55, thurston at nomagicsoftware.com wrote:
>>> Yes, I will push the code and my measurements some time this week.
>>> I'm really interested in seeing what others' results would be. What 
>>> I
>>> can say is that the timings are surprisingly stable (range 30-40ms)
>>> and the input data is randomly generated for each test run which 
>>> can
>>> certainly affect merge(int[], int[])'s performance
>>>
>>> On 2013-02-12 01:50, Kirk Pepperdine wrote:
>>>> Hi,
>>>> On 2013-02-11, at 9:52 PM, thurston at nomagicsoftware.com wrote:
>>>> Hi T,
>>>>
>>>> Can you pub the raw data from your runs? Average tends to hide
>>>> effects and it's difficult to say anything with only that measure.
>>>>
>>>> Regards,
>>>> Kirk
>>>>
>>>>
>>>>> Hello,
>>>>>
>>>>> I made some initial attempts at using ForkJoin framework for some 
>>>>> rather obvious recursively parallel use-cases, namely summing the 
>>>>> elements of an int[] and also sorting an int[] (randomly filled).
>>>>>
>>>>> The problem-sizes I used ranged from 1M - 10M.
>>>>>
>>>>> I really enjoy using the framework and find it relatively easy to 
>>>>> reason about my programs (if not necessarily the internals of the 
>>>>> framework).
>>>>> But I was disappointed with the results: in both cases they were 
>>>>> slower than the corresponding Java serial implementation.
>>>>>
>>>>> I completely understand the principle of YMMV, and I wasn't 
>>>>> expecting 2x speedup, but especially in the case of the sort, but I 
>>>>> was expecting that ForkJoin would do at least a little better than 
>>>>> the single-threaded version.  I guess what I'm asking is: are these 
>>>>> results surprising?  Does it mean I'm doing something wrong?
>>>>>
>>>>> I'll give a brief description of my implementation:
>>>>> single-threaded:  Arrays.sort(int[])
>>>>>
>>>>> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived 
>>>>> at by trial-and-error)
>>>>>
>>>>> int[] compute()
>>>>> {
>>>>> if int[].length < THRESHOLD
>>>>>    return insertionSort(int[])
>>>>> left = new MergeSort(int[] //split in half)
>>>>> right = new MergeSort(int[] //other half)
>>>>> return merge(right.compute(), left.join())
>>>>> }
>>>>>
>>>>> The insertionSort() and merge() methods are just standard 
>>>>> implementations; there is a bit of apples to oranges comparison 
>>>>> since Arrays.sort() uses an optimized quicksort, but we're still 
>>>>> talking O(nlog(n))
>>>>>
>>>>> Just ran it on my laptop:
>>>>> Windows 7 64-bit
>>>>> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
>>>>> Core 2 Duo==> 2 cores 2GHz
>>>>>
>>>>> 2M int[]:
>>>>> single-threaded: ~330ms
>>>>> ForkJoin (2 workers): ~390ms
>>>>>
>>>>> Would appreciate any feedback and hopefully the #s are somewhat 
>>>>> helpful (and please no scoffawing at my antiquated machine)
>>>>>
>>>>> -T
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Concurrency-interest mailing list
>>>>> Concurrency-interest at cs.oswego.edu
>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest



More information about the Concurrency-interest mailing list