[concurrency-interest] ForkJoinPool seems lead to a worse latency than traditional ExecutorServices

Min Zhou coderplay at gmail.com
Tue Apr 17 06:30:17 EDT 2012


Hi, all,

I tried to use  the newest version of  ForkJoinPool from the cvs repository
of jsr166y to replace the old  ExecutorService on our RPC project
opensource at http://code.google.com/p/nfs-rpc/ .

The modification is quite slight. Here is the diff

Index:
nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.java
===================================================================
---
nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.java
(revision
0)
+++
nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.java
(revision
0)
@@ -0,0 +1,48 @@
+package code.google.nfs.rpc;
+/**
+ * nfs-rpc
+ *   Apache License
+ *
+ *   http://code.google.com/p/nfs-rpc (c) 2011
+ */
+import java.util.concurrent.atomic.AtomicInteger;
+
+import code.google.nfs.rpc.jsr166y.ForkJoinPool;
+import
code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
+import code.google.nfs.rpc.jsr166y.ForkJoinWorkerThread;
+
+/**
+ * Helper class to let user can monitor worker threads.
+ *
+ * @author <a href="mailto:coderplay at gmail.com">coderplay</a>
+ */
+public class NamedForkJoinThreadFactory implements
ForkJoinWorkerThreadFactory {
+
+ static final AtomicInteger poolNumber = new AtomicInteger(1);
+
+    final AtomicInteger threadNumber = new AtomicInteger(1);
+    final String namePrefix;
+    final boolean isDaemon;
+
+    public NamedForkJoinThreadFactory() {
+        this("pool");
+    }
+    public NamedForkJoinThreadFactory(String name) {
+        this(name, false);
+    }
+    public NamedForkJoinThreadFactory(String preffix, boolean daemon) {
+        namePrefix = preffix + "-" + poolNumber.getAndIncrement() +
"-thread-";
+        isDaemon = daemon;
+    }
+
+    @Override
+    public ForkJoinWorkerThread newThread(ForkJoinPool pool) {
+        ForkJoinWorkerThread t =
+
 ForkJoinPool.defaultForkJoinWorkerThreadFactory.newThread(pool);
+        t.setName(namePrefix + threadNumber.getAndIncrement());
+        t.setDaemon(isDaemon);
+        return t;
+    }
+
+}
+
Index:
nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmarkServer.java
===================================================================
---
nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmarkServer.java
(revision
120)
+++
nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmarkServer.java
(working
copy)
@@ -8,12 +8,10 @@
 import java.text.SimpleDateFormat;
 import java.util.Date;
 import java.util.concurrent.ExecutorService;
-import java.util.concurrent.SynchronousQueue;
-import java.util.concurrent.ThreadFactory;
-import java.util.concurrent.ThreadPoolExecutor;
-import java.util.concurrent.TimeUnit;

-import code.google.nfs.rpc.NamedThreadFactory;
+import code.google.nfs.rpc.NamedForkJoinThreadFactory;
+import code.google.nfs.rpc.jsr166y.ForkJoinPool;
+import
code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
 import code.google.nfs.rpc.protocol.PBDecoder;
 import code.google.nfs.rpc.protocol.RPCProtocol;
 import code.google.nfs.rpc.protocol.SimpleProcessorProtocol;
@@ -66,9 +64,13 @@
  });
  server.registerProcessor(RPCProtocol.TYPE, "testservice", new
BenchmarkTestServiceImpl(responseSize));
  server.registerProcessor(RPCProtocol.TYPE, "testservicepb", new
PBBenchmarkTestServiceImpl(responseSize));
- ThreadFactory tf = new NamedThreadFactory("BUSINESSTHREADPOOL");
- ExecutorService threadPool = new ThreadPoolExecutor(20, maxThreads,
- 300, TimeUnit.SECONDS, new SynchronousQueue<Runnable>(), tf);
+ ForkJoinWorkerThreadFactory tf = new
NamedForkJoinThreadFactory("BUSINESSTHREADPOOL");
+ ExecutorService threadPool = new ForkJoinPool(maxThreads, tf,
+          new Thread.UncaughtExceptionHandler() {
+              public void uncaughtException(Thread t, Throwable e){
+                // do nothing;
+              };
+          }, true);
  server.start(listenPort, threadPool);
  }


I did a benchmark (see
http://code.google.com/p/nfs-rpc/wiki/HowToRunBenchmark ) with the hope of
significant TPS improvments, but got a bad result cross to the purpose.
 ForkJoinPool (avg response time 12 ms) seems lead to a worse latency than
it did with traditional ExecutorService (avg response time 3ms).

With ForkJoinPool:

----------Benchmark Statistics--------------
 Concurrents: 500
 CodecType: 3
 ClientNums: 1
 RequestSize: 100 bytes
 Runtime: 120 seconds
 Benchmark Time: 81
 Requests: 3740311 Success: 99% (3739274) Error: 0% (1037)
 Avg TPS: 41374 Max TPS: 62881 Min TPS: 3333
 Avg RT: 12ms
 RT <= 0: 0% 1829/3740311
 RT (0,1]: 1% 59989/3740311
 RT (1,5]: 47% 1778386/3740311
 RT (5,10]: 17% 655377/3740311
 RT (10,50]: 32% 1204205/3740311
 RT (50,100]: 0% 31479/3740311
 RT (100,500]: 0% 546/3740311
 RT (500,1000]: 0% 7463/3740311
 RT > 1000: 0% 1037/3740311


With traditional thread pool:
----------Benchmark Statistics--------------
 Concurrents: 500
 CodecType: 3
 ClientNums: 1
 RequestSize: 100 bytes
 Runtime: 120 seconds
 Benchmark Time: 81
 Requests: 12957281 Success: 100% (12957281) Error: 0% (0)
 Avg TPS: 144261 Max TPS: 183390 Min TPS: 81526
 Avg RT: 3ms
 RT <= 0: 0% 3997/12957281
 RT (0,1]: 4% 592905/12957281
 RT (1,5]: 95% 12312500/12957281
 RT (5,10]: 0% 19280/12957281
 RT (10,50]: 0% 92/12957281
 RT (50,100]: 0% 507/12957281
 RT (100,500]: 0% 26500/12957281
 RT (500,1000]: 0% 1500/12957281
 RT > 1000: 0% 0/12957281


I ran this benchmark on two 16 cores Westmere machines ( Xeon E5620 8 core
HT) with the same configuration below of the two tests.

1. JDK version: Oracle 1.7.0_03 (hotspot)

2. client side JVM options:
-Xms4g -Xmx4g -Xmn1g -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Xloggc:gc.log -Dwrite.statistics=true -XX:+UseParallelGC
-XX:+UseCondCardMark -XX:-UseBiasedLocking
-Djava.ext.dirs=/home/min/nfs-rpc
code.google.nfs.rpc.netty.benchmark.NettySimpleBenchmarkClient 10.232.98.96
8888 500 1000 3 100 120 1

3. server side JVM options:
-Xms2g -Xmx2g -Xmn500m -XX:+UseParallelGC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -Xloggc:gc.log -XX:+UseCondCardMark
-XX:-UseBiasedLocking -Djava.ext.dirs=/home/min/nfs-rpc
code.google.nfs.rpc.netty.benchmark.NettyBenchmarkServer 8888 100 100

Low context switches, about 8000 per second, is also observed with
ForkJoinPool against to which with the old threadpool it's about 150000.
Benchmarks under Oracle JDK 1.6 is also did by me with similar results.

Is there anyone kindly explain the reason why leading to those describe
above for me ?

Thanks,
Min

-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120417/5780b0e9/attachment.html>


More information about the Concurrency-interest mailing list