[concurrency-interest] Some interesting (confusing?) benchmark results

√iktor Ҡlang viktor.klang at gmail.com
Tue Feb 12 02:47:32 EST 2013


On Tue, Feb 12, 2013 at 6:13 AM, <thurston at nomagicsoftware.com> wrote:

>
>
> On 2013-02-11 14:24, √iktor Ҡlang wrote:
>
>> On Mon, Feb 11, 2013 at 11:18 PM, David Holmes <davidcholmes at aapt.net.au>
>> wrote:
>>
>>  That doesn't make sense to me. If you had a threshold of 250 then you are
>>> creating thousands of tasks and that overhead should potentially bring
>>> your
>>> laptop to its knees. With 250K threshold you have far fewer tasks.
>>>
>>
>> I'm not sure it makes sense to create more sort-tasks than there is
>> Threads in the pool.
>>
>
> Really, Viktor?  That surprises me coming from Akka.


I have found that when I am surprised by someone, I get a much better
understanding by asking for clarification before I assume too much.


>  Does that mean you guys only allocate as many message-processing
> "workers" as there are hardware threads?


What's you definition of "worker"?

Consider this:

Do you agree that thread pool sizing depends on type of work? (IO bound vs
CPU bound, bursty vs steady etc etc)
Do you agree that a JVM Thread is not a unit of parallelism?
Do you agree that having more JVM Threads than hardware threads is bad for
CPU-bound workloads?


>
> To be clear, with the 2M size array data, and the threshold of 250-size
> array for the (serial) insertion sort, 16K+ sort tasks are constructed (as
> D. Holmes mentioned).  This is significantly *faster* than doing what
> either of you suggested.  If I simply split the problem in two, and just
> run a serial mergesort on each half and then merge the two halves (that's
> three total tasks), it's over 50% slower.
>
> Let me repeat that.  16K+ tasks is much, much faster than 3 tasks.
>
> This is in line with my expectation; what amount of work-stealing could
> happen with two tasks (each merge-sorting a 1M item array)?  With 16K
> tasks, the work-stealing can really strut its stuff (and at least in my
> case,can overwhelm the overhead associated with scheduling the tasks)
>
> If you only create as many tasks as there are hardware threads -- that
> seems a lot like a standard thread-pool implementation.  (Of course, if
> there are many more cores, the opportunity for work-stealing would increase
> even with the 1-1 ratio).
>
> When I first used fork/join, that's the approach I took.
> 1.  split the problem (hardware threads * (1||2)
> 2.  have each task do its work serially (i.e. don't fork)
> 3.  combine the (sub)totals/results
>
> Not only is that less appealing from a programming point of view, it
> performs slower; that's the promise, nee even purpose, of the fork/join
> framework at least when it comes to implementing naturally recursive
> algorithms.
>

What I meant was to create as many tasks as workers, and only process a
chunk per task, and at the end of each task forking it. (See it as
continuation passing style over a pool of workers). Which essentially means
that there won't be more tasks than there are workers.

I hope I cleared things up.

Cheers,
√


>
> -T
>
>
>
>>  That said I don't know what the memory requirements are for your
>>> algorithm.
>>> If you are allocating temporary arrays then a  large threshold could well
>>> cause a problem.
>>>
>>> The parallel array sort going into JDK 8 is a merge sort but uses a
>>> working
>>> array equal in size to the original array to be sorted.
>>>
>>
>> No in-place merge sort? ;-)
>>
>>
>>  -----Original Message-----
>>>>
>>> > From: concurrency-interest-bounces@**cs.oswego.edu<concurrency-interest-bounces at cs.oswego.edu>
>>> > [mailto:concurrency-interest-**bounces at cs.oswego.edu<concurrency-interest-bounces at cs.oswego.edu>]On
>>> Behalf Of
>>> > thurston at nomagicsoftware.com
>>>
>>>  Sent: Tuesday, 12 February 2013 7:49 AM
>>>>
>>> > To: concurrency-interest at cs.**oswego.edu<concurrency-interest at cs.oswego.edu>
>>> > Subject: Re: [concurrency-interest] Some interesting (confusing?)
>>> > benchmark results
>>> >
>>> >
>>> > Well, it's the algorithm that's dispositive in this case.
>>> > Insertion-sort is empirically better than mergesort up to around
>>> > 25-sized arrays.  That's where I started, and I played around with
>>> > different thresholds and 250 seemed to be the best.
>>> > Anything larger than 1000 you start to see significant degradation.
>>> > I tried your formula (approximately) with N = 2 000 000 ==> 250K
>>> >
>>> > That brought my laptop to its knees (as expected): over 33K ms as
>>> > opposed to 390ms with 250.
>>> >
>>> > Following my own admonition (the algorithm matters), it occurs to me
>>> > that I just use my own serial mergesort as comparison to the Forkjoin
>>> > (as opposed to Arrays.sort).  I'll let y'all know the results of that
>>> >
>>> > -T
>>> >
>>> >
>>> >
>>> > On 2013-02-11 13:19, David Holmes wrote:
>>> > > Your sequential threshold is far too small for only two worker
>>> > > threads. You
>>> > > would get near optimum speedup only splitting the problem into two.
>>> > >
>>> > > The heuristic for sequential threshold is:
>>> > >
>>> > > Threshold = N / (LoadBalanceFactor * NumCore)
>>> > >
>>> > > where N is the "size" of the problem.
>>> > >
>>> > > The LoadBalanceFactor in an ideal world would be 1 but real tasks
>>> > > encounter
>>> > > delays and execute differently so a value > 1 is needed.
>>> > > Heuristically a
>>> > > value of 8 is a good starting value.
>>> > >
>>> > > David Holmes
>>> > >
>>> > >> -----Original Message-----
>>> > >> From: concurrency-interest-bounces@**cs.oswego.edu<concurrency-interest-bounces at cs.oswego.edu>
>>> > >> [mailto:concurrency-interest-**bounces at cs.oswego.edu<concurrency-interest-bounces at cs.oswego.edu>]On
>>> Behalf Of
>>> > >> thurston at nomagicsoftware.com
>>> > >> Sent: Tuesday, 12 February 2013 6:52 AM
>>> > >> To: concurrency-interest at cs.**oswego.edu<concurrency-interest at cs.oswego.edu>
>>> > >> Subject: [concurrency-interest] Some interesting (confusing?)
>>> > >> benchmark
>>> > >> results
>>> > >>
>>> > >>
>>> > >> Hello,
>>> > >>
>>> > >> I made some initial attempts at using ForkJoin framework for some
>>> > >> rather obvious recursively parallel use-cases, namely summing the
>>> > >> elements of an int[] and also sorting an int[] (randomly filled).
>>> > >>
>>> > >> The problem-sizes I used ranged from 1M - 10M.
>>> > >>
>>> > >> I really enjoy using the framework and find it relatively easy to
>>> > >> reason about my programs (if not necessarily the internals of the
>>> > >> framework).
>>> > >> But I was disappointed with the results: in both cases they were
>>> > >> slower
>>> > >> than the corresponding Java serial implementation.
>>> > >>
>>> > >> I completely understand the principle of YMMV, and I wasn't
>>> > >> expecting
>>> > >> 2x speedup, but especially in the case of the sort, but I was
>>> > >> expecting
>>> > >> that ForkJoin would do at least a little better than the
>>> > >> single-threaded
>>> > >> version.  I guess what I'm asking is: are these results surprising?
>>> > >> Does it mean I'm doing something wrong?
>>> > >>
>>> > >> I'll give a brief description of my implementation:
>>> > >> single-threaded:  Arrays.sort(int[])
>>> > >>
>>> > >> ForkJoin: I'm using a merge-sort, with THRESHOLD = 250 (arrived at
>>> > >> by
>>> > >> trial-and-error)
>>> > >>
>>> > >> int[] compute()
>>> > >> {
>>> > >> if int[].length < THRESHOLD
>>> > >>      return insertionSort(int[])
>>> > >> left = new MergeSort(int[] //split in half)
>>> > >> right = new MergeSort(int[] //other half)
>>> > >> return merge(right.compute(), left.join())
>>> > >> }
>>> > >>
>>> > >> The insertionSort() and merge() methods are just standard
>>> > >> implementations; there is a bit of apples to oranges comparison
>>> > >> since
>>> > >> Arrays.sort() uses an optimized quicksort, but we're still talking
>>> > >> O(nlog(n))
>>> > >>
>>> > >> Just ran it on my laptop:
>>> > >> Windows 7 64-bit
>>> > >> 1.7.0_03; Java HotSpot(TM) 64-Bit Server VM 22.1-b02
>>> > >> Core 2 Duo==> 2 cores 2GHz
>>> > >>
>>> > >> 2M int[]:
>>> > >> single-threaded: ~330ms
>>> > >> ForkJoin (2 workers): ~390ms
>>> > >>
>>> > >> Would appreciate any feedback and hopefully the #s are somewhat
>>> > >> helpful
>>> > >> (and please no scoffawing at my antiquated machine)
>>> > >>
>>> > >> -T
>>> > >>
>>> > >>
>>> > >> ______________________________**_________________
>>> > >> Concurrency-interest mailing list
>>> > >> Concurrency-interest at cs.**oswego.edu<Concurrency-interest at cs.oswego.edu>
>>> > >> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>[1]
>>>
>>> > >>
>>> >
>>> > ______________________________**_________________
>>> > Concurrency-interest mailing list
>>> > Concurrency-interest at cs.**oswego.edu<Concurrency-interest at cs.oswego.edu>
>>> > http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>[1]
>>>
>>> >
>>>
>>> ______________________________**_________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.**oswego.edu<Concurrency-interest at cs.oswego.edu>
>>> http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>[1]
>>>
>>> --
>>>
>>
>> r:rgb(0,0,0);font-family:**Times;font-variant:normal;**
>> letter-spacing:normal;line-**height:normal;text-align:-**
>> webkit-auto;text-indent:0px;**text-transform:none;white-**
>> space:normal;word-spacing:0px;**font-size:medium">VIKTOR
>> KLANG
>>  _Director of Engineering_
>>
>>  Typesafe [2] - The software stack for applications that scale
>>  Twitter: @viktorklang
>>
>> Links:
>> ------
>> [1] http://cs.oswego.edu/mailman/**listinfo/concurrency-interest<http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>> [2] http://www.typesafe.com/
>>
>
>


-- 
*Viktor Klang*
*Director of Engineering*
*
*
Typesafe <http://www.typesafe.com/> - The software stack for applications
that scale
Twitter: @viktorklang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130212/dffb3582/attachment.html>


More information about the Concurrency-interest mailing list