[concurrency-interest] Some interesting (confusing?) benchmark results

Peter Levart peter.levart at gmail.com
Wed Feb 13 11:30:22 EST 2013


On 02/13/2013 04:59 PM, thurston at nomagicsoftware.com wrote:
> so you have 4 cores?
> Makes sense that those results correspond to 20M as opposed to 2M
> And given that you changed insertSort() -> Arrays.sort, if you change 
> threshold to input.length / ForkJoinPool.threads (presumably 4 in your 
> case), does that perform better?
It does (N=20M, THRESHOLD=5M, parallelism=4):

Arrays.sort: 1529550064 nanos
Arrays.sort: 1478430550 nanos
Arrays.sort: 1475643365 nanos
Arrays.sort: 1481308646 nanos
Arrays.sort: 1477199836 nanos
MergeSorter2: 500316422 nanos
MergeSorter2: 486144409 nanos
MergeSorter2: 468044083 nanos
MergeSorter2: 624446626 nanos
MergeSorter2: 635152854 nanos


>
> Many people were suggesting that the # of tasks created should be 
> bounded by the # of threads in the fjpool which should == # of 
> hardware threads. (you can accomplish this by setting the threshold to 
> above formula).

But as David Holmes has noted, the THRESHOLD should be further divided 
by 8 as a starting point, to accommodate for unequal execution of 
threads/tasks, so that there is no waiting at the end for final long 
tasks to complete while other threads in pool are IDLE-ing...

For example (N=20M, THRESHOLD=500K, parallelism=4):

MergeSorter2: 531712994 nanos
MergeSorter2: 539048356 nanos
MergeSorter2: 586208993 nanos
MergeSorter2: 544239837 nanos
MergeSorter2: 497440246 nanos

And (N=20M, THRESHOLD=50K, parallelism=4):

MergeSorter2: 548634308 nanos
MergeSorter2: 579119444 nanos
MergeSorter2: 555252699 nanos
MergeSorter2: 505902467 nanos
MergeSorter2: 503907973 nanos

And (N=20M, THRESHOLD=5K, parallelism=4):

MergeSorter2: 547342635 nanos
MergeSorter2: 575145627 nanos
MergeSorter2: 559160837 nanos
MergeSorter2: 530709697 nanos
MergeSorter2: 562272911 nanos

...it does perform a little slower, but in average with lower THRESHOLD, 
the performance should be more stable...


Regards, Peter

>
> And with 20M/250 (threshold), you would be creating what 100Ks?, 1Ms? 
> (you can comment in the AtomicInteger.increment()) in the constructor 
> to save on the math
>
> And of course *I* understand the performance characteristics of 
> insertionSort, look at the implementation in SortUtil (which is not 
> parallel).  What I did makes perfect sense if you understand 
> insertSort vs mergeSort
>
> On 2013-02-13 07:34, Peter Levart wrote:
>> On 02/13/2013 04:27 PM, thurston at nomagicsoftware.com wrote:
>>> Could you publish your machine specs?
>>> And what was the size of your sample array?  2M or something else?
>>> If it is 2M (and if I'm doing my math right), it's weird that my 
>>> times are less:
>>> basically for 2M/250, I get around 370ms +- 25ms for the MergeSorter
>>> and around 330ms for java's Array.sort
>>
>> Oh, sorry. My machine is i7 Linux PC, using recent JDK8 build. I had
>> to increase the N to 20M to actually make it sweat a little...
>>
>> Your unchanged MergeSorter is better than Arrays.sort with
>> parallelism = 4 for example:
>>
>> MergeSorter: 1134740049 nanos
>> MergeSorter: 1082783496 nanos
>> MergeSorter: 1033780556 nanos
>> MergeSorter2: 720092808 nanos
>> MergeSorter2: 649416922 nanos
>> MergeSorter2: 605587079 nanos
>> Arrays.sort: 1454093373 nanos
>> Arrays.sort: 1466476368 nanos
>> Arrays.sort: 1461604292 nanos
>>
>> ...but with only 2 threads it is on-par with Arrays.sort.
>>
>> Regards, Peter
>>
>>>
>>> Thanks
>>>
>>> On 2013-02-13 06:40, Peter Levart wrote:
>>>> On 02/12/2013 08:20 PM, thurston at nomagicsoftware.com wrote:
>>>>> Here is all of the relevant code in a single blob.  I think it's 
>>>>> selef-explanatory; sample test is at the end.
>>>>>
>>>>> http://pastebin.com/WrfBHYSG
>>>>
>>>> Hi Thurston,
>>>>
>>>> You're doing too much copying. I modified tour MergeSorter a little:
>>>>
>>>>
>>>> public class MergeSorter2 extends RecursiveTask<int[]> {
>>>>     static final int THRESHOLD = 250;
>>>>     final int[] array, work;
>>>>     final int offset, length;
>>>>
>>>>     public MergeSorter2(int[] array) {
>>>>         this.array = array;
>>>>         this.work = new int[array.length];
>>>>         offset = 0;
>>>>         length = array.length;
>>>>     }
>>>>
>>>>     private MergeSorter2(int[] array, int[] work, int offset, int 
>>>> length) {
>>>>         this.array = array;
>>>>         this.work = work;
>>>>         this.offset = offset;
>>>>         this.length = length;
>>>>     }
>>>>
>>>>     @Override
>>>>     protected int[] compute() {
>>>>         if (length < MergeSorter2.THRESHOLD) {
>>>>             Arrays.sort(array, offset, offset + length);
>>>>             return this.array;
>>>>         }
>>>>
>>>>         int halfLength = length >> 1;
>>>>         MergeSorter2 left = new MergeSorter2(array, work, offset, 
>>>> halfLength);
>>>>         left.fork();
>>>>
>>>>         int mid = offset + halfLength;
>>>>
>>>>         MergeSorter2 right = new MergeSorter2(array, work, mid,
>>>> length - halfLength);
>>>>         right.compute();
>>>>         left.join();
>>>>
>>>>         // copy 1st half from array to work
>>>>         System.arraycopy(array, offset, work, offset, halfLength);
>>>>
>>>>         // merge 1st half of work and 2nd half of array into array
>>>>         int end = offset + length;
>>>>         int i = offset;
>>>>         int j = mid;
>>>>         int k = offset;
>>>>         int x = work[i];
>>>>         int y = array[j];
>>>>         while (true) {
>>>>             if (j >= end || x <= y) {
>>>>                 array[k++] = x;
>>>>                 i++;
>>>>                 // until we drain the 1st half (the rest of array is
>>>> already in place)
>>>>                 if (i >= mid) break;
>>>>                 x = work[i];
>>>>             } else { // j < end
>>>>                 array[k++] = y;
>>>>                 j++;
>>>>                 if (j < end) y = array[j];
>>>>             }
>>>>         }
>>>>
>>>>         return this.array;
>>>>     }
>>>> }
>>>>
>>>>
>>>> ... it uses a single parallel "work" array for merging, which is
>>>> shared among tasks. It tries to minimize the copying. Here's the
>>>> results I get on a FJPool with parallelism=2:
>>>>
>>>>  MergeSorter: 1557638090 nanos
>>>>  MergeSorter: 1545600414 nanos
>>>>  MergeSorter: 1478185502 nanos
>>>> MergeSorter2: 1058750062 nanos
>>>> MergeSorter2:  978295370 nanos
>>>> MergeSorter2:  976838429 nanos
>>>>  Arrays.sort: 1494256525 nanos
>>>>  Arrays.sort: 1505639161 nanos
>>>>  Arrays.sort: 1483640342 nanos
>>>>
>>>>
>>>> I get even better results with greater THRESHOLD (but not more than
>>>> 10% better). Your MergeSorter is worse with greater THRESHOLD because
>>>> you are using insert-sort for final tasks. The Arrays.sort() delegates
>>>> to insert-sort only when N < 47, else it uses quick-sort...
>>>>
>>>> Regards, Peter
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2013-02-12 01:55, thurston at nomagicsoftware.com wrote:
>>>>>> Yes, I will push the code and my measurements some time this week.
>>>>>> I'm really interested in seeing what others' results would be. 
>>>>>> What I
>>>>>> can say is that the timings are surprisingly stable (range 30-40ms)
>>>>>> and the input data is randomly generated for each test run which can
>>>>>> certainly affect merge(int[], int[])'s performance
>>>>>>
>>>>>> On 2013-02-12 01:50, Kirk Pepperdine wrote:
>>>>>>> Hi,
>>>>>>> On 2013-02-11, at 9:52 PM, thurston at nomagicsoftware.com wrote:
>>>>>>> Hi T,
>>>>>>>
>>>>>>> Can you pub the raw data from your runs? Average tends to hide
>>>>>>> effects and it's difficult to say anything with only that measure.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Kirk
>>>>>>>
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I made some initial attempts at using ForkJoin framework for 
>>>>>>>> some rather obvious recursively parallel use-cases, namely 
>>>>>>>> summing the elements of an int[] and also sorting an int[] 
>>>>>>>> (randomly filled).
>>>>>>>>
>>>>>>>> The problem-sizes I used ranged from 1M - 10M.
>>>>>>>>
>>>>>>>> I really enjoy using the framework and find it relatively easy 
>>>>>>>> to reason about my programs (if not necessarily the internals 
>>>>>>>> of the framework).
>>>>>>>> But I was disappointed with the results: in both cases they 
>>>>>>>> were slower than the corresponding Java serial implementation.
>>>>>>>>
>>>>>>>> I completely understand the principle of YMMV, and I wasn't 
>>>>>>>> expecting 2x speedup, but especially in the case of the sort, 
>>>>>>>> but I was expecting that ForkJoin would do at least a little 
>>>>>>>> better than the single-threaded version.  I guess what I'm 
>>>>>>>> asking is: are these results surprising?  Does it mean I'm 
>>>>>>>> doing something wrong?
>>>>>>>>
>>>>>>>> I'll give a brief description of my implementation:
>>>>>>>> single-threaded:  Arrays.sort(int[])
>>>>>>>>
>>>>>>>> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived 
>>>>>>>> at by trial-and-error)
>>>>>>>>
>>>>>>>> int[] compute()
>>>>>>>> {
>>>>>>>> if int[].length < THRESHOLD
>>>>>>>>    return insertionSort(int[])
>>>>>>>> left = new MergeSort(int[] //split in half)
>>>>>>>> right = new MergeSort(int[] //other half)
>>>>>>>> return merge(right.compute(), left.join())
>>>>>>>> }
>>>>>>>>
>>>>>>>> The insertionSort() and merge() methods are just standard 
>>>>>>>> implementations; there is a bit of apples to oranges comparison 
>>>>>>>> since Arrays.sort() uses an optimized quicksort, but we're 
>>>>>>>> still talking O(nlog(n))
>>>>>>>>
>>>>>>>> Just ran it on my laptop:
>>>>>>>> Windows 7 64-bit
>>>>>>>> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
>>>>>>>> Core 2 Duo==> 2 cores 2GHz
>>>>>>>>
>>>>>>>> 2M int[]:
>>>>>>>> single-threaded: ~330ms
>>>>>>>> ForkJoin (2 workers): ~390ms
>>>>>>>>
>>>>>>>> Would appreciate any feedback and hopefully the #s are somewhat 
>>>>>>>> helpful (and please no scoffawing at my antiquated machine)
>>>>>>>>
>>>>>>>> -T
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Concurrency-interest mailing list
>>>>>>>> Concurrency-interest at cs.oswego.edu
>>>>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>>
>>>>> _______________________________________________
>>>>> Concurrency-interest mailing list
>>>>> Concurrency-interest at cs.oswego.edu
>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>



More information about the Concurrency-interest mailing list