[concurrency-interest] ForkJoinPool seems lead to a worselatencythan traditional ExecutorServices

David Holmes davidcholmes at aapt.net.au
Tue Apr 17 08:51:20 EDT 2012


Sorry that was somewhat terse.

ForkJoinPool is not a drop-in replacement as an arbitrary ExecutorService.
It is specifically design to efficiently execute tasks that implement
fork/join parallelism. If your tasks don't perform fork/join parallelism but
are plain old Runnables/callables that do blocking I/O and other "regular"
programming operations then they will not likely see any benefit from using
a ForkJoinPool.

David
  -----Original Message-----
  From: concurrency-interest-bounces at cs.oswego.edu
[mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of David Holmes
  Sent: Tuesday, 17 April 2012 10:14 PM
  To: Min Zhou; concurrency-interest at cs.oswego.edu
  Subject: Re: [concurrency-interest] ForkJoinPool seems lead to a
worselatencythan traditional ExecutorServices


  What makes your RPC project suitable for Fork/Join parallelism?

  David Holmes
    -----Original Message-----
    From: concurrency-interest-bounces at cs.oswego.edu
[mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Min Zhou
    Sent: Tuesday, 17 April 2012 8:30 PM
    To: concurrency-interest at cs.oswego.edu
    Subject: [concurrency-interest] ForkJoinPool seems lead to a worse
latencythan traditional ExecutorServices


    Hi, all,


    I tried to use  the newest version of  ForkJoinPool from the cvs
repository of jsr166y to replace the old  ExecutorService on our RPC project
opensource at http://code.google.com/p/nfs-rpc/ .


    The modification is quite slight. Here is the diff


    Index:
nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.
java
    ===================================================================
    ---
nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.
java (revision 0)
    +++
nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.
java (revision 0)
    @@ -0,0 +1,48 @@
    +package code.google.nfs.rpc;
    +/**
    + * nfs-rpc
    + *   Apache License
    + *
    + *   http://code.google.com/p/nfs-rpc (c) 2011
    + */
    +import java.util.concurrent.atomic.AtomicInteger;
    +
    +import code.google.nfs.rpc.jsr166y.ForkJoinPool;
    +import
code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
    +import code.google.nfs.rpc.jsr166y.ForkJoinWorkerThread;
    +
    +/**
    + * Helper class to let user can monitor worker threads.
    + *
    + * @author <a href="mailto:coderplay at gmail.com">coderplay</a>
    + */
    +public class NamedForkJoinThreadFactory implements
ForkJoinWorkerThreadFactory {
    +
    + static final AtomicInteger poolNumber = new AtomicInteger(1);
    +
    +    final AtomicInteger threadNumber = new AtomicInteger(1);
    +    final String namePrefix;
    +    final boolean isDaemon;
    +
    +    public NamedForkJoinThreadFactory() {
    +        this("pool");
    +    }
    +    public NamedForkJoinThreadFactory(String name) {
    +        this(name, false);
    +    }
    +    public NamedForkJoinThreadFactory(String preffix, boolean daemon) {
    +        namePrefix = preffix + "-" + poolNumber.getAndIncrement() +
"-thread-";
    +        isDaemon = daemon;
    +    }
    +
    +    @Override
    +    public ForkJoinWorkerThread newThread(ForkJoinPool pool) {
    +        ForkJoinWorkerThread t =
    +
ForkJoinPool.defaultForkJoinWorkerThreadFactory.newThread(pool);
    +        t.setName(namePrefix + threadNumber.getAndIncrement());
    +        t.setDaemon(isDaemon);
    +        return t;
    +    }
    +
    +}
    +
    Index:
nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmark
Server.java
    ===================================================================
    ---
nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmark
Server.java (revision 120)
    +++
nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmark
Server.java (working copy)
    @@ -8,12 +8,10 @@
     import java.text.SimpleDateFormat;
     import java.util.Date;
     import java.util.concurrent.ExecutorService;
    -import java.util.concurrent.SynchronousQueue;
    -import java.util.concurrent.ThreadFactory;
    -import java.util.concurrent.ThreadPoolExecutor;
    -import java.util.concurrent.TimeUnit;

    -import code.google.nfs.rpc.NamedThreadFactory;
    +import code.google.nfs.rpc.NamedForkJoinThreadFactory;
    +import code.google.nfs.rpc.jsr166y.ForkJoinPool;
    +import
code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
     import code.google.nfs.rpc.protocol.PBDecoder;
     import code.google.nfs.rpc.protocol.RPCProtocol;
     import code.google.nfs.rpc.protocol.SimpleProcessorProtocol;
    @@ -66,9 +64,13 @@
      });
      server.registerProcessor(RPCProtocol.TYPE, "testservice", new
BenchmarkTestServiceImpl(responseSize));
      server.registerProcessor(RPCProtocol.TYPE, "testservicepb", new
PBBenchmarkTestServiceImpl(responseSize));
    - ThreadFactory tf = new NamedThreadFactory("BUSINESSTHREADPOOL");
    - ExecutorService threadPool = new ThreadPoolExecutor(20, maxThreads,
    - 300, TimeUnit.SECONDS, new SynchronousQueue<Runnable>(), tf);
    + ForkJoinWorkerThreadFactory tf = new
NamedForkJoinThreadFactory("BUSINESSTHREADPOOL");
    + ExecutorService threadPool = new ForkJoinPool(maxThreads, tf,
    +          new Thread.UncaughtExceptionHandler() {
    +              public void uncaughtException(Thread t, Throwable e){
    +                // do nothing;
    +              };
    +          }, true);
      server.start(listenPort, threadPool);
      }




    I did a benchmark (see
http://code.google.com/p/nfs-rpc/wiki/HowToRunBenchmark ) with the hope of
significant TPS improvments, but got a bad result cross to the purpose.
ForkJoinPool (avg response time 12 ms) seems lead to a worse latency than it
did with traditional ExecutorService (avg response time 3ms).


    With ForkJoinPool:


    ----------Benchmark Statistics--------------
     Concurrents: 500
     CodecType: 3
     ClientNums: 1
     RequestSize: 100 bytes
     Runtime: 120 seconds
     Benchmark Time: 81
     Requests: 3740311 Success: 99% (3739274) Error: 0% (1037)
     Avg TPS: 41374 Max TPS: 62881 Min TPS: 3333
     Avg RT: 12ms
     RT <= 0: 0% 1829/3740311
     RT (0,1]: 1% 59989/3740311
     RT (1,5]: 47% 1778386/3740311
     RT (5,10]: 17% 655377/3740311
     RT (10,50]: 32% 1204205/3740311
     RT (50,100]: 0% 31479/3740311
     RT (100,500]: 0% 546/3740311
     RT (500,1000]: 0% 7463/3740311
     RT > 1000: 0% 1037/3740311




    With traditional thread pool:
    ----------Benchmark Statistics--------------
     Concurrents: 500
     CodecType: 3
     ClientNums: 1
     RequestSize: 100 bytes
     Runtime: 120 seconds
     Benchmark Time: 81
     Requests: 12957281 Success: 100% (12957281) Error: 0% (0)
     Avg TPS: 144261 Max TPS: 183390 Min TPS: 81526
     Avg RT: 3ms
     RT <= 0: 0% 3997/12957281
     RT (0,1]: 4% 592905/12957281
     RT (1,5]: 95% 12312500/12957281
     RT (5,10]: 0% 19280/12957281
     RT (10,50]: 0% 92/12957281
     RT (50,100]: 0% 507/12957281
     RT (100,500]: 0% 26500/12957281
     RT (500,1000]: 0% 1500/12957281
     RT > 1000: 0% 0/12957281




    I ran this benchmark on two 16 cores Westmere machines ( Xeon E5620 8
core HT) with the same configuration below of the two tests.


    1. JDK version: Oracle 1.7.0_03 (hotspot)


    2. client side JVM options:
    -Xms4g -Xmx4g -Xmn1g -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:
gc.log -Dwrite.statistics=true -XX:+UseParallelGC -XX:+UseCondCardMark -XX:-
UseBiasedLocking -Djava.ext.dirs=/home/min/nfs-rpc
code.google.nfs.rpc.netty.benchmark.NettySimpleBenchmarkClient 10.232.98.96
8888 500 1000 3 100 120 1


    3. server side JVM options:
    -Xms2g -Xmx2g -Xmn500m -XX:+UseParallelGC -XX:+PrintGCDetails -XX:+Print
GCDateStamps -Xloggc:gc.log -XX:+UseCondCardMark -XX:-UseBiasedLocking -Djav
a.ext.dirs=/home/min/nfs-rpc
code.google.nfs.rpc.netty.benchmark.NettyBenchmarkServer 8888 100 100


    Low context switches, about 8000 per second, is also observed with
ForkJoinPool against to which with the old threadpool it's about 150000.
    Benchmarks under Oracle JDK 1.6 is also did by me with similar results.


    Is there anyone kindly explain the reason why leading to those describe
above for me ?


    Thanks,
    Min


    --
    My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120417/956c46a6/attachment-0001.html>


More information about the Concurrency-interest mailing list