[concurrency-interest] Some interesting (confusing?) benchmark results

thurston at nomagicsoftware.com thurston at nomagicsoftware.com
Tue Feb 12 00:13:53 EST 2013



On 2013-02-11 14:24, √iktor Ҡlang wrote:
> On Mon, Feb 11, 2013 at 11:18 PM, David Holmes 
> <davidcholmes at aapt.net.au> wrote:
>
>> That doesn't make sense to me. If you had a threshold of 250 then 
>> you are
>> creating thousands of tasks and that overhead should potentially 
>> bring your
>> laptop to its knees. With 250K threshold you have far fewer tasks.
>
> I'm not sure it makes sense to create more sort-tasks than there is
> Threads in the pool.

Really, Viktor?  That surprises me coming from Akka.  Does that mean 
you guys only allocate as many message-processing "workers" as there are 
hardware threads?

To be clear, with the 2M size array data, and the threshold of 250-size 
array for the (serial) insertion sort, 16K+ sort tasks are constructed 
(as D. Holmes mentioned).  This is significantly *faster* than doing 
what either of you suggested.  If I simply split the problem in two, and 
just run a serial mergesort on each half and then merge the two halves 
(that's three total tasks), it's over 50% slower.

Let me repeat that.  16K+ tasks is much, much faster than 3 tasks.

This is in line with my expectation; what amount of work-stealing could 
happen with two tasks (each merge-sorting a 1M item array)?  With 16K 
tasks, the work-stealing can really strut its stuff (and at least in my 
case,can overwhelm the overhead associated with scheduling the tasks)

If you only create as many tasks as there are hardware threads -- that 
seems a lot like a standard thread-pool implementation.  (Of course, if 
there are many more cores, the opportunity for work-stealing would 
increase even with the 1-1 ratio).

When I first used fork/join, that's the approach I took.
1.  split the problem (hardware threads * (1||2)
2.  have each task do its work serially (i.e. don't fork)
3.  combine the (sub)totals/results

Not only is that less appealing from a programming point of view, it 
performs slower; that's the promise, nee even purpose, of the fork/join 
framework at least when it comes to implementing naturally recursive 
algorithms.

-T


>
>> That said I don't know what the memory requirements are for your 
>> algorithm.
>> If you are allocating temporary arrays then a  large threshold could 
>> well
>> cause a problem.
>>
>> The parallel array sort going into JDK 8 is a merge sort but uses a 
>> working
>> array equal in size to the original array to be sorted.
>
> No in-place merge sort? ;-)
>  
>
>>> -----Original Message-----
>> > From: concurrency-interest-bounces at cs.oswego.edu
>> > [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of
>> > thurston at nomagicsoftware.com
>>
>>> Sent: Tuesday, 12 February 2013 7:49 AM
>> > To: concurrency-interest at cs.oswego.edu
>> > Subject: Re: [concurrency-interest] Some interesting (confusing?)
>> > benchmark results
>> >
>> >
>> > Well, it's the algorithm that's dispositive in this case.
>> > Insertion-sort is empirically better than mergesort up to around
>> > 25-sized arrays.  That's where I started, and I played around with
>> > different thresholds and 250 seemed to be the best.
>> > Anything larger than 1000 you start to see significant 
>> degradation.
>> > I tried your formula (approximately) with N = 2 000 000 ==> 250K
>> >
>> > That brought my laptop to its knees (as expected): over 33K ms as
>> > opposed to 390ms with 250.
>> >
>> > Following my own admonition (the algorithm matters), it occurs to 
>> me
>> > that I just use my own serial mergesort as comparison to the 
>> Forkjoin
>> > (as opposed to Arrays.sort).  I'll let y'all know the results of 
>> that
>> >
>> > -T
>> >
>> >
>> >
>> > On 2013-02-11 13:19, David Holmes wrote:
>> > > Your sequential threshold is far too small for only two worker
>> > > threads. You
>> > > would get near optimum speedup only splitting the problem into 
>> two.
>> > >
>> > > The heuristic for sequential threshold is:
>> > >
>> > > Threshold = N / (LoadBalanceFactor * NumCore)
>> > >
>> > > where N is the "size" of the problem.
>> > >
>> > > The LoadBalanceFactor in an ideal world would be 1 but real 
>> tasks
>> > > encounter
>> > > delays and execute differently so a value > 1 is needed.
>> > > Heuristically a
>> > > value of 8 is a good starting value.
>> > >
>> > > David Holmes
>> > >
>> > >> -----Original Message-----
>> > >> From: concurrency-interest-bounces at cs.oswego.edu
>> > >> [mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of
>> > >> thurston at nomagicsoftware.com
>> > >> Sent: Tuesday, 12 February 2013 6:52 AM
>> > >> To: concurrency-interest at cs.oswego.edu
>> > >> Subject: [concurrency-interest] Some interesting (confusing?)
>> > >> benchmark
>> > >> results
>> > >>
>> > >>
>> > >> Hello,
>> > >>
>> > >> I made some initial attempts at using ForkJoin framework for 
>> some
>> > >> rather obvious recursively parallel use-cases, namely summing 
>> the
>> > >> elements of an int[] and also sorting an int[] (randomly 
>> filled).
>> > >>
>> > >> The problem-sizes I used ranged from 1M - 10M.
>> > >>
>> > >> I really enjoy using the framework and find it relatively easy 
>> to
>> > >> reason about my programs (if not necessarily the internals of 
>> the
>> > >> framework).
>> > >> But I was disappointed with the results: in both cases they 
>> were
>> > >> slower
>> > >> than the corresponding Java serial implementation.
>> > >>
>> > >> I completely understand the principle of YMMV, and I wasn't
>> > >> expecting
>> > >> 2x speedup, but especially in the case of the sort, but I was
>> > >> expecting
>> > >> that ForkJoin would do at least a little better than the
>> > >> single-threaded
>> > >> version.  I guess what I'm asking is: are these results 
>> surprising?
>> > >> Does it mean I'm doing something wrong?
>> > >>
>> > >> I'll give a brief description of my implementation:
>> > >> single-threaded:  Arrays.sort(int[])
>> > >>
>> > >> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived 
>> at
>> > >> by
>> > >> trial-and-error)
>> > >>
>> > >> int[] compute()
>> > >> {
>> > >> if int[].length < THRESHOLD
>> > >>      return insertionSort(int[])
>> > >> left = new MergeSort(int[] //split in half)
>> > >> right = new MergeSort(int[] //other half)
>> > >> return merge(right.compute(), left.join())
>> > >> }
>> > >>
>> > >> The insertionSort() and merge() methods are just standard
>> > >> implementations; there is a bit of apples to oranges comparison
>> > >> since
>> > >> Arrays.sort() uses an optimized quicksort, but we're still 
>> talking
>> > >> O(nlog(n))
>> > >>
>> > >> Just ran it on my laptop:
>> > >> Windows 7 64-bit
>> > >> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
>> > >> Core 2 Duo==> 2 cores 2GHz
>> > >>
>> > >> 2M int[]:
>> > >> single-threaded: ~330ms
>> > >> ForkJoin (2 workers): ~390ms
>> > >>
>> > >> Would appreciate any feedback and hopefully the #s are somewhat
>> > >> helpful
>> > >> (and please no scoffawing at my antiquated machine)
>> > >>
>> > >> -T
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> Concurrency-interest mailing list
>> > >> Concurrency-interest at cs.oswego.edu
>> > >> http://cs.oswego.edu/mailman/listinfo/concurrency-interest [1]
>> > >>
>> >
>> > _______________________________________________
>> > Concurrency-interest mailing list
>> > Concurrency-interest at cs.oswego.edu
>> > http://cs.oswego.edu/mailman/listinfo/concurrency-interest [1]
>> >
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest [1]
>>
>> --
> 
> r:rgb(0,0,0);font-family:Times;font-variant:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium">VIKTOR
> KLANG
>  _Director of Engineering_
>
>  Typesafe [2] - The software stack for applications that scale
>  Twitter: @viktorklang
>
> Links:
> ------
> [1] http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> [2] http://www.typesafe.com/



More information about the Concurrency-interest mailing list