[concurrency-interest] ForkJoinPool seems lead to a worselatencythan traditional ExecutorServices

Min Zhou coderplay at gmail.com
Tue Apr 17 21:49:52 EDT 2012


Hi, Viktor,

This is a throughput benchmark, do you have any number of about the latency?

Thanks,
Min

On Tue, Apr 17, 2012 at 9:22 PM, √iktor Ҡlang <viktor.klang at gmail.com>wrote:

>
>
> On Tue, Apr 17, 2012 at 2:51 PM, David Holmes <davidcholmes at aapt.net.au>wrote:
>
>> **
>> Sorry that was somewhat terse.
>>
>> ForkJoinPool is not a drop-in replacement as an arbitrary
>> ExecutorService. It is specifically design to efficiently execute tasks
>> that implement fork/join parallelism. If your tasks don't perform fork/join
>> parallelism but are plain old Runnables/callables that do blocking I/O and
>> other "regular" programming operations then they will not likely see any
>> benefit from using a ForkJoinPool.
>>
>
> I disagree:
>
>
> http://letitcrash.com/post/20397701710/50-million-messages-per-second-on-a-single-machine
>
> Cheers,
>>
>
>>
>> David
>>
>> -----Original Message-----
>> *From:* concurrency-interest-bounces at cs.oswego.edu [mailto:
>> concurrency-interest-bounces at cs.oswego.edu]*On Behalf Of *David Holmes
>> *Sent:* Tuesday, 17 April 2012 10:14 PM
>> *To:* Min Zhou; concurrency-interest at cs.oswego.edu
>> *Subject:* Re: [concurrency-interest] ForkJoinPool seems lead to a
>> worselatencythan traditional ExecutorServices
>>
>> What makes your RPC project suitable for Fork/Join parallelism?
>>
>> David Holmes
>>
>> -----Original Message-----
>> *From:* concurrency-interest-bounces at cs.oswego.edu [mailto:
>> concurrency-interest-bounces at cs.oswego.edu]*On Behalf Of *Min Zhou
>> *Sent:* Tuesday, 17 April 2012 8:30 PM
>> *To:* concurrency-interest at cs.oswego.edu
>> *Subject:* [concurrency-interest] ForkJoinPool seems lead to a worse
>> latencythan traditional ExecutorServices
>>
>> Hi, all,
>>
>> I tried to use  the newest version of  ForkJoinPool from the cvs
>> repository of jsr166y to replace the old  ExecutorService on our RPC
>> project opensource at http://code.google.com/p/nfs-rpc/ .
>>
>> The modification is quite slight. Here is the diff
>>
>>  Index:
>> nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.java
>> ===================================================================
>> ---
>> nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.java (revision
>> 0)
>> +++
>> nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.java (revision
>> 0)
>> @@ -0,0 +1,48 @@
>> +package code.google.nfs.rpc;
>> +/**
>> + * nfs-rpc
>> + *   Apache License
>> + *
>> + *   http://code.google.com/p/nfs-rpc (c) 2011
>> + */
>> +import java.util.concurrent.atomic.AtomicInteger;
>> +
>> +import code.google.nfs.rpc.jsr166y.ForkJoinPool;
>> +import
>> code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
>> +import code.google.nfs.rpc.jsr166y.ForkJoinWorkerThread;
>> +
>> +/**
>> + * Helper class to let user can monitor worker threads.
>> + *
>> + * @author <a href="mailto:coderplay at gmail.com">coderplay</a>
>> + */
>> +public class NamedForkJoinThreadFactory implements
>> ForkJoinWorkerThreadFactory {
>> +
>> + static final AtomicInteger poolNumber = new AtomicInteger(1);
>> +
>> +    final AtomicInteger threadNumber = new AtomicInteger(1);
>> +    final String namePrefix;
>> +    final boolean isDaemon;
>> +
>> +    public NamedForkJoinThreadFactory() {
>> +        this("pool");
>> +    }
>> +    public NamedForkJoinThreadFactory(String name) {
>> +        this(name, false);
>> +    }
>> +    public NamedForkJoinThreadFactory(String preffix, boolean daemon) {
>> +        namePrefix = preffix + "-" + poolNumber.getAndIncrement() +
>> "-thread-";
>> +        isDaemon = daemon;
>> +    }
>> +
>> +    @Override
>> +    public ForkJoinWorkerThread newThread(ForkJoinPool pool) {
>> +        ForkJoinWorkerThread t =
>> +
>>  ForkJoinPool.defaultForkJoinWorkerThreadFactory.newThread(pool);
>> +        t.setName(namePrefix + threadNumber.getAndIncrement());
>> +        t.setDaemon(isDaemon);
>> +        return t;
>> +    }
>> +
>> +}
>> +
>> Index:
>> nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmarkServer.java
>> ===================================================================
>> ---
>> nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmarkServer.java (revision
>> 120)
>> +++
>> nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmarkServer.java (working
>> copy)
>> @@ -8,12 +8,10 @@
>>  import java.text.SimpleDateFormat;
>>  import java.util.Date;
>>  import java.util.concurrent.ExecutorService;
>> -import java.util.concurrent.SynchronousQueue;
>> -import java.util.concurrent.ThreadFactory;
>> -import java.util.concurrent.ThreadPoolExecutor;
>> -import java.util.concurrent.TimeUnit;
>>
>> -import code.google.nfs.rpc.NamedThreadFactory;
>> +import code.google.nfs.rpc.NamedForkJoinThreadFactory;
>> +import code.google.nfs.rpc.jsr166y.ForkJoinPool;
>> +import
>> code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
>>  import code.google.nfs.rpc.protocol.PBDecoder;
>>  import code.google.nfs.rpc.protocol.RPCProtocol;
>>  import code.google.nfs.rpc.protocol.SimpleProcessorProtocol;
>> @@ -66,9 +64,13 @@
>>   });
>>   server.registerProcessor(RPCProtocol.TYPE, "testservice", new
>> BenchmarkTestServiceImpl(responseSize));
>>   server.registerProcessor(RPCProtocol.TYPE, "testservicepb", new
>> PBBenchmarkTestServiceImpl(responseSize));
>> - ThreadFactory tf = new NamedThreadFactory("BUSINESSTHREADPOOL");
>> - ExecutorService threadPool = new ThreadPoolExecutor(20, maxThreads,
>> - 300, TimeUnit.SECONDS, new SynchronousQueue<Runnable>(), tf);
>> + ForkJoinWorkerThreadFactory tf = new
>> NamedForkJoinThreadFactory("BUSINESSTHREADPOOL");
>> + ExecutorService threadPool = new ForkJoinPool(maxThreads, tf,
>> +          new Thread.UncaughtExceptionHandler() {
>> +              public void uncaughtException(Thread t, Throwable e){
>> +                // do nothing;
>> +              };
>> +          }, true);
>>   server.start(listenPort, threadPool);
>>   }
>>
>>
>> I did a benchmark (see
>> http://code.google.com/p/nfs-rpc/wiki/HowToRunBenchmark ) with the hope
>> of significant TPS improvments, but got a bad result cross to the purpose.
>>  ForkJoinPool (avg response time 12 ms) seems lead to a worse latency than
>> it did with traditional ExecutorService (avg response time 3ms).
>>
>> With ForkJoinPool:
>>
>>  ----------Benchmark Statistics--------------
>>  Concurrents: 500
>>  CodecType: 3
>>  ClientNums: 1
>>  RequestSize: 100 bytes
>>  Runtime: 120 seconds
>>  Benchmark Time: 81
>>  Requests: 3740311 Success: 99% (3739274) Error: 0% (1037)
>>  Avg TPS: 41374 Max TPS: 62881 Min TPS: 3333
>>  Avg RT: 12ms
>>  RT <= 0: 0% 1829/3740311
>>  RT (0,1]: 1% 59989/3740311
>>  RT (1,5]: 47% 1778386/3740311
>>  RT (5,10]: 17% 655377/3740311
>>  RT (10,50]: 32% 1204205/3740311
>>  RT (50,100]: 0% 31479/3740311
>>  RT (100,500]: 0% 546/3740311
>>  RT (500,1000]: 0% 7463/3740311
>>  RT > 1000: 0% 1037/3740311
>>
>>
>> With traditional thread pool:
>>  ----------Benchmark Statistics--------------
>>  Concurrents: 500
>>  CodecType: 3
>>  ClientNums: 1
>>  RequestSize: 100 bytes
>>  Runtime: 120 seconds
>>  Benchmark Time: 81
>>  Requests: 12957281 Success: 100% (12957281) Error: 0% (0)
>>  Avg TPS: 144261 Max TPS: 183390 Min TPS: 81526
>>  Avg RT: 3ms
>>  RT <= 0: 0% 3997/12957281
>>  RT (0,1]: 4% 592905/12957281
>>  RT (1,5]: 95% 12312500/12957281
>>  RT (5,10]: 0% 19280/12957281
>>  RT (10,50]: 0% 92/12957281
>>  RT (50,100]: 0% 507/12957281
>>  RT (100,500]: 0% 26500/12957281
>>  RT (500,1000]: 0% 1500/12957281
>>  RT > 1000: 0% 0/12957281
>>
>>
>> I ran this benchmark on two 16 cores Westmere machines ( Xeon E5620 8
>> core HT) with the same configuration below of the two tests.
>>
>> 1. JDK version: Oracle 1.7.0_03 (hotspot)
>>
>> 2. client side JVM options:
>> -Xms4g -Xmx4g -Xmn1g -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>> -Xloggc:gc.log -Dwrite.statistics=true -XX:+UseParallelGC
>> -XX:+UseCondCardMark -XX:-UseBiasedLocking
>> -Djava.ext.dirs=/home/min/nfs-rpc
>> code.google.nfs.rpc.netty.benchmark.NettySimpleBenchmarkClient 10.232.98.96
>> 8888 500 1000 3 100 120 1
>>
>> 3. server side JVM options:
>> -Xms2g -Xmx2g -Xmn500m -XX:+UseParallelGC -XX:+PrintGCDetails
>> -XX:+PrintGCDateStamps -Xloggc:gc.log -XX:+UseCondCardMark
>> -XX:-UseBiasedLocking -Djava.ext.dirs=/home/min/nfs-rpc
>> code.google.nfs.rpc.netty.benchmark.NettyBenchmarkServer 8888 100 100
>>
>> Low context switches, about 8000 per second, is also observed with
>> ForkJoinPool against to which with the old threadpool it's about 150000.
>> Benchmarks under Oracle JDK 1.6 is also did by me with similar results.
>>
>> Is there anyone kindly explain the reason why leading to those describe
>> above for me ?
>>
>> Thanks,
>> Min
>>
>> --
>> My research interests are distributed systems, parallel computing and
>> bytecode based virtual machine.
>>
>> My profile:
>> http://www.linkedin.com/in/coderplay
>> My blog:
>> http://coderplay.javaeye.com
>>
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
>
>
> --
> Viktor Klang
>
> Akka Tech Lead
> Typesafe <http://www.typesafe.com/> - The software stack for applications
> that scale
>
> Twitter: @viktorklang
>
>


-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120418/336a66a5/attachment-0001.html>


More information about the Concurrency-interest mailing list