[concurrency-interest] Some interesting (confusing?) benchmark results
Peter Levart
peter.levart at gmail.com
Wed Feb 13 10:34:16 EST 2013
On 02/13/2013 04:27 PM, thurston at nomagicsoftware.com wrote:
> Could you publish your machine specs?
> And what was the size of your sample array? 2M or something else?
> If it is 2M (and if I'm doing my math right), it's weird that my times
> are less:
> basically for 2M/250, I get around 370ms +- 25ms for the MergeSorter
> and around 330ms for java's Array.sort
Oh, sorry. My machine is i7 Linux PC, using recent JDK8 build. I had to
increase the N to 20M to actually make it sweat a little...
Your unchanged MergeSorter is better than Arrays.sort with parallelism =
4 for example:
MergeSorter: 1134740049 nanos
MergeSorter: 1082783496 nanos
MergeSorter: 1033780556 nanos
MergeSorter2: 720092808 nanos
MergeSorter2: 649416922 nanos
MergeSorter2: 605587079 nanos
Arrays.sort: 1454093373 nanos
Arrays.sort: 1466476368 nanos
Arrays.sort: 1461604292 nanos
...but with only 2 threads it is on-par with Arrays.sort.
Regards, Peter
>
> Thanks
>
> On 2013-02-13 06:40, Peter Levart wrote:
>> On 02/12/2013 08:20 PM, thurston at nomagicsoftware.com wrote:
>>> Here is all of the relevant code in a single blob. I think it's
>>> selef-explanatory; sample test is at the end.
>>>
>>> http://pastebin.com/WrfBHYSG
>>
>> Hi Thurston,
>>
>> You're doing too much copying. I modified tour MergeSorter a little:
>>
>>
>> public class MergeSorter2 extends RecursiveTask<int[]> {
>> static final int THRESHOLD = 250;
>> final int[] array, work;
>> final int offset, length;
>>
>> public MergeSorter2(int[] array) {
>> this.array = array;
>> this.work = new int[array.length];
>> offset = 0;
>> length = array.length;
>> }
>>
>> private MergeSorter2(int[] array, int[] work, int offset, int
>> length) {
>> this.array = array;
>> this.work = work;
>> this.offset = offset;
>> this.length = length;
>> }
>>
>> @Override
>> protected int[] compute() {
>> if (length < MergeSorter2.THRESHOLD) {
>> Arrays.sort(array, offset, offset + length);
>> return this.array;
>> }
>>
>> int halfLength = length >> 1;
>> MergeSorter2 left = new MergeSorter2(array, work, offset,
>> halfLength);
>> left.fork();
>>
>> int mid = offset + halfLength;
>>
>> MergeSorter2 right = new MergeSorter2(array, work, mid,
>> length - halfLength);
>> right.compute();
>> left.join();
>>
>> // copy 1st half from array to work
>> System.arraycopy(array, offset, work, offset, halfLength);
>>
>> // merge 1st half of work and 2nd half of array into array
>> int end = offset + length;
>> int i = offset;
>> int j = mid;
>> int k = offset;
>> int x = work[i];
>> int y = array[j];
>> while (true) {
>> if (j >= end || x <= y) {
>> array[k++] = x;
>> i++;
>> // until we drain the 1st half (the rest of array is
>> already in place)
>> if (i >= mid) break;
>> x = work[i];
>> } else { // j < end
>> array[k++] = y;
>> j++;
>> if (j < end) y = array[j];
>> }
>> }
>>
>> return this.array;
>> }
>> }
>>
>>
>> ... it uses a single parallel "work" array for merging, which is
>> shared among tasks. It tries to minimize the copying. Here's the
>> results I get on a FJPool with parallelism=2:
>>
>> MergeSorter: 1557638090 nanos
>> MergeSorter: 1545600414 nanos
>> MergeSorter: 1478185502 nanos
>> MergeSorter2: 1058750062 nanos
>> MergeSorter2: 978295370 nanos
>> MergeSorter2: 976838429 nanos
>> Arrays.sort: 1494256525 nanos
>> Arrays.sort: 1505639161 nanos
>> Arrays.sort: 1483640342 nanos
>>
>>
>> I get even better results with greater THRESHOLD (but not more than
>> 10% better). Your MergeSorter is worse with greater THRESHOLD because
>> you are using insert-sort for final tasks. The Arrays.sort() delegates
>> to insert-sort only when N < 47, else it uses quick-sort...
>>
>> Regards, Peter
>>
>>
>>
>>>
>>>
>>>
>>> On 2013-02-12 01:55, thurston at nomagicsoftware.com wrote:
>>>> Yes, I will push the code and my measurements some time this week.
>>>> I'm really interested in seeing what others' results would be. What I
>>>> can say is that the timings are surprisingly stable (range 30-40ms)
>>>> and the input data is randomly generated for each test run which can
>>>> certainly affect merge(int[], int[])'s performance
>>>>
>>>> On 2013-02-12 01:50, Kirk Pepperdine wrote:
>>>>> Hi,
>>>>> On 2013-02-11, at 9:52 PM, thurston at nomagicsoftware.com wrote:
>>>>> Hi T,
>>>>>
>>>>> Can you pub the raw data from your runs? Average tends to hide
>>>>> effects and it's difficult to say anything with only that measure.
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I made some initial attempts at using ForkJoin framework for some
>>>>>> rather obvious recursively parallel use-cases, namely summing the
>>>>>> elements of an int[] and also sorting an int[] (randomly filled).
>>>>>>
>>>>>> The problem-sizes I used ranged from 1M - 10M.
>>>>>>
>>>>>> I really enjoy using the framework and find it relatively easy to
>>>>>> reason about my programs (if not necessarily the internals of the
>>>>>> framework).
>>>>>> But I was disappointed with the results: in both cases they were
>>>>>> slower than the corresponding Java serial implementation.
>>>>>>
>>>>>> I completely understand the principle of YMMV, and I wasn't
>>>>>> expecting 2x speedup, but especially in the case of the sort, but
>>>>>> I was expecting that ForkJoin would do at least a little better
>>>>>> than the single-threaded version. I guess what I'm asking is:
>>>>>> are these results surprising? Does it mean I'm doing something
>>>>>> wrong?
>>>>>>
>>>>>> I'll give a brief description of my implementation:
>>>>>> single-threaded: Arrays.sort(int[])
>>>>>>
>>>>>> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived
>>>>>> at by trial-and-error)
>>>>>>
>>>>>> int[] compute()
>>>>>> {
>>>>>> if int[].length < THRESHOLD
>>>>>> return insertionSort(int[])
>>>>>> left = new MergeSort(int[] //split in half)
>>>>>> right = new MergeSort(int[] //other half)
>>>>>> return merge(right.compute(), left.join())
>>>>>> }
>>>>>>
>>>>>> The insertionSort() and merge() methods are just standard
>>>>>> implementations; there is a bit of apples to oranges comparison
>>>>>> since Arrays.sort() uses an optimized quicksort, but we're still
>>>>>> talking O(nlog(n))
>>>>>>
>>>>>> Just ran it on my laptop:
>>>>>> Windows 7 64-bit
>>>>>> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
>>>>>> Core 2 Duo==> 2 cores 2GHz
>>>>>>
>>>>>> 2M int[]:
>>>>>> single-threaded: ~330ms
>>>>>> ForkJoin (2 workers): ~390ms
>>>>>>
>>>>>> Would appreciate any feedback and hopefully the #s are somewhat
>>>>>> helpful (and please no scoffawing at my antiquated machine)
>>>>>>
>>>>>> -T
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Concurrency-interest mailing list
>>>>>> Concurrency-interest at cs.oswego.edu
>>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
More information about the Concurrency-interest
mailing list