[concurrency-interest] Strange behaviour when sorting an uniform array

Ben Manes ben_manes at yahoo.com
Tue May 25 13:31:13 EDT 2010


A common approach to resolve quicksort's worse case scenario is to shuffle the input prior to sorting (an O(n) operation). So in practice you might want to do that, though if you knew your data set was so uniform you'd probably optimize with a different sorting strategy (e.g. counting sort).




________________________________
From: Lukas Krecan <lukas.krecan at gmail.com>
To: concurrency-interest at cs.oswego.edu
Sent: Tue, May 25, 2010 7:24:42 AM
Subject: Re: [concurrency-interest] Strange behaviour when sorting an uniform array

Thanks for your comment. I did not want to claim that I have done a 
benchmark. I just wanted to illustrate that non-parallel implementation 
does not exhibit the n*n behaviour. That's all.

On 05/25/2010 04:05 PM, Michael Spiegel wrote:
> I think the benchmark would benefit from a warmup phase.  It would
> ensure the methods in question are JIT compiled. Stackoverflow has a
> reasonable set of links on how to write a Java microbenchmark:
> http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java.
>
> --Michael
>
> On Tue, May 25, 2010 at 7:00 AM, Lukas Krecan<lukas.krecan at gmail.com>  wrote:
>    
>> Sure, the synchronized statement is just a leftover from previous
>> experiments.
>>
>> I have tried the same test with standard java.util.Arrays.sort and I am
>> getting
>>
>> 1) Uniform array (full of 1L) ~ 200ms
>> 2) rand.nextInt(20) ~ 1400ms
>> 3) rand.nextLong() ~ 7700ms
>>
>> The reason might be that a modified quicksort algorithm is used there. Would
>> it be possible to use the same algorithm for ParallelArray? I am afraid that
>> most programmers have already forgotten the details of sorting algorithms
>> and would be as surprised as I was.
>>       Cheers
>>            Lukas
>>
>> On 05/25/2010 11:44 AM, Kasper Nielsen wrote:
>>      
>>> Ahh, got it. You still want to remove that synchronized statement for
>>> serious usage though.
>>>
>>> What you are seing is the worst case behavior for quicksort which
>>> ParallelArray is using under the hood. All equal elements will kill most
>>> quicksort implementations. Instead of the average running time of n*logn you
>>> are seing worst case behaviour of n*n.
>>>
>>> Even large datasets with just a few different values will slow most
>>> implementations down significantly.
>>> I tried replacing
>>>   return 1L;//rand.nextLong();
>>> with
>>>   return rand.nextInt(20)
>>> and PA was still around 70 times slower compared to using rand.nextLong().
>>>
>>> We might want to switch to the dual pivot implementation introduced in
>>> OpenJDK 7. Don't know about licensing issues though.
>>>
>>> - Kasper
>>>
>>>
>>> On 25/5/2010 15:10, Lukas Krecan wrote:
>>>        
>>>> Hi,
>>>>     I am quite sure that it has nothing to do the synchronization. The
>>>> problem remains even if synchronized keyword is removed. Besides, the
>>>> method was used only for generating values, which works well. Problem is
>>>> with sorting. It works well on random values, but runs for ages on
>>>> uniform array.
>>>>
>>>>
>>>> On Tue, May 25, 2010 at 10:00 AM, Kasper Nielsen<kasper at kav.dk
>>>> <mailto:kasper at kav.dk>>  wrote:
>>>>
>>>>     Hi Lukas,
>>>>
>>>>     try removing the synchronized keyword in your op.
>>>>
>>>>     Taken from the extra166y.Ops javadoc:
>>>>
>>>>     In addition to stated signatures, implementations of these
>>>>     interfaces must work safely in parallel. In general, this means
>>>>     methods should operate only on their arguments, and should not rely
>>>>     on ThreadLocals, unsafely published globals, or other unsafe
>>>>     constructions. Additionally, they should not block waiting for
>>>>     synchronization.
>>>>
>>>>     If you want to return random numbers you should use the
>>>>     ThreadLocalRandom.
>>>>
>>>>
>>>>     - Kasper
>>>>
>>>>
>>>>     On 25/5/2010 07:9, Lukas Krecan wrote:
>>>>
>>>>         Hi,
>>>>             I have been playing with extra166y library and found that
>>>>         when I try
>>>>         to sort an uniform array (full of ones) it takes ages. With older
>>>>         version of the library it even threw StackOverflowError. After
>>>>         update it
>>>>         runs for minutes (I always kill the process after while so I do
>>>>         not know
>>>>         if it ends successfully).
>>>>
>>>>         Probably it's nothing new for you, maybe it's expected
>>>>         behaviour. But
>>>>         for me it's confusing and I just wanted to let you know about
>>>>         the issue
>>>>         in case you were not aware of it.
>>>>                Best regards
>>>>                  Lukas Krecan
>>>>
>>>>         -------------------------------------
>>>>         package net.krecan.forkjoin;
>>>>
>>>>         import java.util.Random;
>>>>
>>>>         import jsr166y.ForkJoinPool;
>>>>         import extra166y.Ops;
>>>>         import extra166y.ParallelLongArray;
>>>>
>>>>
>>>>         public class SortTest {
>>>>              private static final int THREADS = 2;
>>>>              private static final int SIZE = 40000000;
>>>>
>>>>              public static void testSort()
>>>>              {
>>>>                  ForkJoinPool fjPool = new ForkJoinPool(THREADS);
>>>>                  ParallelLongArray pa =
>>>> ParallelLongArray.createEmpty(SIZE,
>>>>         fjPool);
>>>>                  createData(pa);
>>>>                  System.out.println("Sorting "+pa.summary());
>>>>                  long start = System.currentTimeMillis();
>>>>                  pa.sort();
>>>>                  System.out.println("Done in
>>>>         "+(System.currentTimeMillis()-start)+"ms.");
>>>>                  System.out.println(pa.summary());
>>>>
>>>>
>>>>              }
>>>>              private static void createData(ParallelLongArray pa) {
>>>>                  long start = System.currentTimeMillis();
>>>>                  System.out.println("Creating data using "+THREADS+"
>>>>         threads.");
>>>>                  pa.setLimit(SIZE);
>>>>                  final Random rand = new Random();
>>>>                  pa.replaceWithGeneratedValue(new Ops.LongGenerator(){
>>>>                      public synchronized long op() {
>>>>                          return 1L;//rand.nextLong();
>>>>                      }
>>>>                  });
>>>>                  System.out.println("Done in
>>>>         "+(System.currentTimeMillis()-start)+"ms.");
>>>>              }
>>>>
>>>>              public static void main(String[] args) {
>>>>                  testSort();
>>>>              }
>>>>
>>>>         }
>>>>
>>>>
>>>>
>>>>         _______________________________________________
>>>>         Concurrency-interest mailing list
>>>>        Concurrency-interest at cs.oswego.edu
>>>> <mailto:Concurrency-interest at cs.oswego.edu>
>>>>        http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>
>>>>
>>>>     _______________________________________________
>>>>     Concurrency-interest mailing list
>>>>    Concurrency-interest at cs.oswego.edu
>>>> <mailto:Concurrency-interest at cs.oswego.edu>
>>>>    http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest at cs.oswego.edu
>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>          
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>        
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>      
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>    

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20100525/eec25171/attachment.html>


More information about the Concurrency-interest mailing list