[concurrency-interest] Some interesting (confusing?) benchmark results

Peter Levart peter.levart at gmail.com
Wed Feb 13 10:34:16 EST 2013


On 02/13/2013 04:27 PM, thurston at nomagicsoftware.com wrote:
> Could you publish your machine specs?
> And what was the size of your sample array?  2M or something else?
> If it is 2M (and if I'm doing my math right), it's weird that my times 
> are less:
> basically for 2M/250, I get around 370ms +- 25ms for the MergeSorter
> and around 330ms for java's Array.sort

Oh, sorry. My machine is i7 Linux PC, using recent JDK8 build. I had to 
increase the N to 20M to actually make it sweat a little...

Your unchanged MergeSorter is better than Arrays.sort with parallelism = 
4 for example:

MergeSorter: 1134740049 nanos
MergeSorter: 1082783496 nanos
MergeSorter: 1033780556 nanos
MergeSorter2: 720092808 nanos
MergeSorter2: 649416922 nanos
MergeSorter2: 605587079 nanos
Arrays.sort: 1454093373 nanos
Arrays.sort: 1466476368 nanos
Arrays.sort: 1461604292 nanos

...but with only 2 threads it is on-par with Arrays.sort.

Regards, Peter

>
> Thanks
>
> On 2013-02-13 06:40, Peter Levart wrote:
>> On 02/12/2013 08:20 PM, thurston at nomagicsoftware.com wrote:
>>> Here is all of the relevant code in a single blob.  I think it's 
>>> selef-explanatory; sample test is at the end.
>>>
>>> http://pastebin.com/WrfBHYSG
>>
>> Hi Thurston,
>>
>> You're doing too much copying. I modified tour MergeSorter a little:
>>
>>
>> public class MergeSorter2 extends RecursiveTask<int[]> {
>>     static final int THRESHOLD = 250;
>>     final int[] array, work;
>>     final int offset, length;
>>
>>     public MergeSorter2(int[] array) {
>>         this.array = array;
>>         this.work = new int[array.length];
>>         offset = 0;
>>         length = array.length;
>>     }
>>
>>     private MergeSorter2(int[] array, int[] work, int offset, int 
>> length) {
>>         this.array = array;
>>         this.work = work;
>>         this.offset = offset;
>>         this.length = length;
>>     }
>>
>>     @Override
>>     protected int[] compute() {
>>         if (length < MergeSorter2.THRESHOLD) {
>>             Arrays.sort(array, offset, offset + length);
>>             return this.array;
>>         }
>>
>>         int halfLength = length >> 1;
>>         MergeSorter2 left = new MergeSorter2(array, work, offset, 
>> halfLength);
>>         left.fork();
>>
>>         int mid = offset + halfLength;
>>
>>         MergeSorter2 right = new MergeSorter2(array, work, mid,
>> length - halfLength);
>>         right.compute();
>>         left.join();
>>
>>         // copy 1st half from array to work
>>         System.arraycopy(array, offset, work, offset, halfLength);
>>
>>         // merge 1st half of work and 2nd half of array into array
>>         int end = offset + length;
>>         int i = offset;
>>         int j = mid;
>>         int k = offset;
>>         int x = work[i];
>>         int y = array[j];
>>         while (true) {
>>             if (j >= end || x <= y) {
>>                 array[k++] = x;
>>                 i++;
>>                 // until we drain the 1st half (the rest of array is
>> already in place)
>>                 if (i >= mid) break;
>>                 x = work[i];
>>             } else { // j < end
>>                 array[k++] = y;
>>                 j++;
>>                 if (j < end) y = array[j];
>>             }
>>         }
>>
>>         return this.array;
>>     }
>> }
>>
>>
>> ... it uses a single parallel "work" array for merging, which is
>> shared among tasks. It tries to minimize the copying. Here's the
>> results I get on a FJPool with parallelism=2:
>>
>>  MergeSorter: 1557638090 nanos
>>  MergeSorter: 1545600414 nanos
>>  MergeSorter: 1478185502 nanos
>> MergeSorter2: 1058750062 nanos
>> MergeSorter2:  978295370 nanos
>> MergeSorter2:  976838429 nanos
>>  Arrays.sort: 1494256525 nanos
>>  Arrays.sort: 1505639161 nanos
>>  Arrays.sort: 1483640342 nanos
>>
>>
>> I get even better results with greater THRESHOLD (but not more than
>> 10% better). Your MergeSorter is worse with greater THRESHOLD because
>> you are using insert-sort for final tasks. The Arrays.sort() delegates
>> to insert-sort only when N < 47, else it uses quick-sort...
>>
>> Regards, Peter
>>
>>
>>
>>>
>>>
>>>
>>> On 2013-02-12 01:55, thurston at nomagicsoftware.com wrote:
>>>> Yes, I will push the code and my measurements some time this week.
>>>> I'm really interested in seeing what others' results would be. What I
>>>> can say is that the timings are surprisingly stable (range 30-40ms)
>>>> and the input data is randomly generated for each test run which can
>>>> certainly affect merge(int[], int[])'s performance
>>>>
>>>> On 2013-02-12 01:50, Kirk Pepperdine wrote:
>>>>> Hi,
>>>>> On 2013-02-11, at 9:52 PM, thurston at nomagicsoftware.com wrote:
>>>>> Hi T,
>>>>>
>>>>> Can you pub the raw data from your runs? Average tends to hide
>>>>> effects and it's difficult to say anything with only that measure.
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I made some initial attempts at using ForkJoin framework for some 
>>>>>> rather obvious recursively parallel use-cases, namely summing the 
>>>>>> elements of an int[] and also sorting an int[] (randomly filled).
>>>>>>
>>>>>> The problem-sizes I used ranged from 1M - 10M.
>>>>>>
>>>>>> I really enjoy using the framework and find it relatively easy to 
>>>>>> reason about my programs (if not necessarily the internals of the 
>>>>>> framework).
>>>>>> But I was disappointed with the results: in both cases they were 
>>>>>> slower than the corresponding Java serial implementation.
>>>>>>
>>>>>> I completely understand the principle of YMMV, and I wasn't 
>>>>>> expecting 2x speedup, but especially in the case of the sort, but 
>>>>>> I was expecting that ForkJoin would do at least a little better 
>>>>>> than the single-threaded version.  I guess what I'm asking is: 
>>>>>> are these results surprising?  Does it mean I'm doing something 
>>>>>> wrong?
>>>>>>
>>>>>> I'll give a brief description of my implementation:
>>>>>> single-threaded:  Arrays.sort(int[])
>>>>>>
>>>>>> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived 
>>>>>> at by trial-and-error)
>>>>>>
>>>>>> int[] compute()
>>>>>> {
>>>>>> if int[].length < THRESHOLD
>>>>>>    return insertionSort(int[])
>>>>>> left = new MergeSort(int[] //split in half)
>>>>>> right = new MergeSort(int[] //other half)
>>>>>> return merge(right.compute(), left.join())
>>>>>> }
>>>>>>
>>>>>> The insertionSort() and merge() methods are just standard 
>>>>>> implementations; there is a bit of apples to oranges comparison 
>>>>>> since Arrays.sort() uses an optimized quicksort, but we're still 
>>>>>> talking O(nlog(n))
>>>>>>
>>>>>> Just ran it on my laptop:
>>>>>> Windows 7 64-bit
>>>>>> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
>>>>>> Core 2 Duo==> 2 cores 2GHz
>>>>>>
>>>>>> 2M int[]:
>>>>>> single-threaded: ~330ms
>>>>>> ForkJoin (2 workers): ~390ms
>>>>>>
>>>>>> Would appreciate any feedback and hopefully the #s are somewhat 
>>>>>> helpful (and please no scoffawing at my antiquated machine)
>>>>>>
>>>>>> -T
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Concurrency-interest mailing list
>>>>>> Concurrency-interest at cs.oswego.edu
>>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>



More information about the Concurrency-interest mailing list