[concurrency-interest] ForkJoinPool seems lead to a worse latencythan traditional ExecutorServices

David Holmes davidcholmes at aapt.net.au
Tue Apr 17 08:13:56 EDT 2012


What makes your RPC project suitable for Fork/Join parallelism?

David Holmes
  -----Original Message-----
  From: concurrency-interest-bounces at cs.oswego.edu
[mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Min Zhou
  Sent: Tuesday, 17 April 2012 8:30 PM
  To: concurrency-interest at cs.oswego.edu
  Subject: [concurrency-interest] ForkJoinPool seems lead to a worse
latencythan traditional ExecutorServices


  Hi, all,


  I tried to use  the newest version of  ForkJoinPool from the cvs
repository of jsr166y to replace the old  ExecutorService on our RPC project
opensource at http://code.google.com/p/nfs-rpc/ .


  The modification is quite slight. Here is the diff


  Index:
nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.
java
  ===================================================================
  ---
nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.
java (revision 0)
  +++
nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.
java (revision 0)
  @@ -0,0 +1,48 @@
  +package code.google.nfs.rpc;
  +/**
  + * nfs-rpc
  + *   Apache License
  + *
  + *   http://code.google.com/p/nfs-rpc (c) 2011
  + */
  +import java.util.concurrent.atomic.AtomicInteger;
  +
  +import code.google.nfs.rpc.jsr166y.ForkJoinPool;
  +import
code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
  +import code.google.nfs.rpc.jsr166y.ForkJoinWorkerThread;
  +
  +/**
  + * Helper class to let user can monitor worker threads.
  + *
  + * @author <a href="mailto:coderplay at gmail.com">coderplay</a>
  + */
  +public class NamedForkJoinThreadFactory implements
ForkJoinWorkerThreadFactory {
  +
  + static final AtomicInteger poolNumber = new AtomicInteger(1);
  +
  +    final AtomicInteger threadNumber = new AtomicInteger(1);
  +    final String namePrefix;
  +    final boolean isDaemon;
  +
  +    public NamedForkJoinThreadFactory() {
  +        this("pool");
  +    }
  +    public NamedForkJoinThreadFactory(String name) {
  +        this(name, false);
  +    }
  +    public NamedForkJoinThreadFactory(String preffix, boolean daemon) {
  +        namePrefix = preffix + "-" + poolNumber.getAndIncrement() +
"-thread-";
  +        isDaemon = daemon;
  +    }
  +
  +    @Override
  +    public ForkJoinWorkerThread newThread(ForkJoinPool pool) {
  +        ForkJoinWorkerThread t =
  +
ForkJoinPool.defaultForkJoinWorkerThreadFactory.newThread(pool);
  +        t.setName(namePrefix + threadNumber.getAndIncrement());
  +        t.setDaemon(isDaemon);
  +        return t;
  +    }
  +
  +}
  +
  Index:
nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmark
Server.java
  ===================================================================
  ---
nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmark
Server.java (revision 120)
  +++
nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmark
Server.java (working copy)
  @@ -8,12 +8,10 @@
   import java.text.SimpleDateFormat;
   import java.util.Date;
   import java.util.concurrent.ExecutorService;
  -import java.util.concurrent.SynchronousQueue;
  -import java.util.concurrent.ThreadFactory;
  -import java.util.concurrent.ThreadPoolExecutor;
  -import java.util.concurrent.TimeUnit;

  -import code.google.nfs.rpc.NamedThreadFactory;
  +import code.google.nfs.rpc.NamedForkJoinThreadFactory;
  +import code.google.nfs.rpc.jsr166y.ForkJoinPool;
  +import
code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
   import code.google.nfs.rpc.protocol.PBDecoder;
   import code.google.nfs.rpc.protocol.RPCProtocol;
   import code.google.nfs.rpc.protocol.SimpleProcessorProtocol;
  @@ -66,9 +64,13 @@
    });
    server.registerProcessor(RPCProtocol.TYPE, "testservice", new
BenchmarkTestServiceImpl(responseSize));
    server.registerProcessor(RPCProtocol.TYPE, "testservicepb", new
PBBenchmarkTestServiceImpl(responseSize));
  - ThreadFactory tf = new NamedThreadFactory("BUSINESSTHREADPOOL");
  - ExecutorService threadPool = new ThreadPoolExecutor(20, maxThreads,
  - 300, TimeUnit.SECONDS, new SynchronousQueue<Runnable>(), tf);
  + ForkJoinWorkerThreadFactory tf = new
NamedForkJoinThreadFactory("BUSINESSTHREADPOOL");
  + ExecutorService threadPool = new ForkJoinPool(maxThreads, tf,
  +          new Thread.UncaughtExceptionHandler() {
  +              public void uncaughtException(Thread t, Throwable e){
  +                // do nothing;
  +              };
  +          }, true);
    server.start(listenPort, threadPool);
    }




  I did a benchmark (see
http://code.google.com/p/nfs-rpc/wiki/HowToRunBenchmark ) with the hope of
significant TPS improvments, but got a bad result cross to the purpose.
ForkJoinPool (avg response time 12 ms) seems lead to a worse latency than it
did with traditional ExecutorService (avg response time 3ms).


  With ForkJoinPool:


  ----------Benchmark Statistics--------------
   Concurrents: 500
   CodecType: 3
   ClientNums: 1
   RequestSize: 100 bytes
   Runtime: 120 seconds
   Benchmark Time: 81
   Requests: 3740311 Success: 99% (3739274) Error: 0% (1037)
   Avg TPS: 41374 Max TPS: 62881 Min TPS: 3333
   Avg RT: 12ms
   RT <= 0: 0% 1829/3740311
   RT (0,1]: 1% 59989/3740311
   RT (1,5]: 47% 1778386/3740311
   RT (5,10]: 17% 655377/3740311
   RT (10,50]: 32% 1204205/3740311
   RT (50,100]: 0% 31479/3740311
   RT (100,500]: 0% 546/3740311
   RT (500,1000]: 0% 7463/3740311
   RT > 1000: 0% 1037/3740311




  With traditional thread pool:
  ----------Benchmark Statistics--------------
   Concurrents: 500
   CodecType: 3
   ClientNums: 1
   RequestSize: 100 bytes
   Runtime: 120 seconds
   Benchmark Time: 81
   Requests: 12957281 Success: 100% (12957281) Error: 0% (0)
   Avg TPS: 144261 Max TPS: 183390 Min TPS: 81526
   Avg RT: 3ms
   RT <= 0: 0% 3997/12957281
   RT (0,1]: 4% 592905/12957281
   RT (1,5]: 95% 12312500/12957281
   RT (5,10]: 0% 19280/12957281
   RT (10,50]: 0% 92/12957281
   RT (50,100]: 0% 507/12957281
   RT (100,500]: 0% 26500/12957281
   RT (500,1000]: 0% 1500/12957281
   RT > 1000: 0% 0/12957281




  I ran this benchmark on two 16 cores Westmere machines ( Xeon E5620 8 core
HT) with the same configuration below of the two tests.


  1. JDK version: Oracle 1.7.0_03 (hotspot)


  2. client side JVM options:
  -Xms4g -Xmx4g -Xmn1g -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:gc
.log -Dwrite.statistics=true -XX:+UseParallelGC -XX:+UseCondCardMark -XX:-Us
eBiasedLocking -Djava.ext.dirs=/home/min/nfs-rpc
code.google.nfs.rpc.netty.benchmark.NettySimpleBenchmarkClient 10.232.98.96
8888 500 1000 3 100 120 1


  3. server side JVM options:
  -Xms2g -Xmx2g -Xmn500m -XX:+UseParallelGC -XX:+PrintGCDetails -XX:+PrintGC
DateStamps -Xloggc:gc.log -XX:+UseCondCardMark -XX:-UseBiasedLocking -Djava.
ext.dirs=/home/min/nfs-rpc
code.google.nfs.rpc.netty.benchmark.NettyBenchmarkServer 8888 100 100


  Low context switches, about 8000 per second, is also observed with
ForkJoinPool against to which with the old threadpool it's about 150000.
  Benchmarks under Oracle JDK 1.6 is also did by me with similar results.


  Is there anyone kindly explain the reason why leading to those describe
above for me ?


  Thanks,
  Min


  --
  My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

  My profile:
  http://www.linkedin.com/in/coderplay
  My blog:
  http://coderplay.javaeye.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120417/5d15ba41/attachment-0001.html>


More information about the Concurrency-interest mailing list