[concurrency-interest] ForkJoinPool seems lead to a worselatencythan traditional ExecutorServices

Min Zhou coderplay at gmail.com
Tue Apr 17 10:47:26 EDT 2012


Thanks David for your reply.
The business for each F/J task is quite simple, taking message from the
front-end I/O threads and put them back into a  transfer queue where the
I/O threads would consume and send back to the clients eventually.  Yes,
there is a blocking operation for each F/J task that put the messages into
a transfer queue. It that okay?

Thanks,
Min

On Tue, Apr 17, 2012 at 8:51 PM, David Holmes <davidcholmes at aapt.net.au>wrote:

> **
> Sorry that was somewhat terse.
>
> ForkJoinPool is not a drop-in replacement as an arbitrary ExecutorService.
> It is specifically design to efficiently execute tasks that implement
> fork/join parallelism. If your tasks don't perform fork/join parallelism
> but are plain old Runnables/callables that do blocking I/O and other
> "regular" programming operations then they will not likely see any benefit
> from using a ForkJoinPool.
>
> David
>
> -----Original Message-----
> *From:* concurrency-interest-bounces at cs.oswego.edu [mailto:
> concurrency-interest-bounces at cs.oswego.edu]*On Behalf Of *David Holmes
> *Sent:* Tuesday, 17 April 2012 10:14 PM
> *To:* Min Zhou; concurrency-interest at cs.oswego.edu
> *Subject:* Re: [concurrency-interest] ForkJoinPool seems lead to a
> worselatencythan traditional ExecutorServices
>
> What makes your RPC project suitable for Fork/Join parallelism?
>
> David Holmes
>
> -----Original Message-----
> *From:* concurrency-interest-bounces at cs.oswego.edu [mailto:
> concurrency-interest-bounces at cs.oswego.edu]*On Behalf Of *Min Zhou
> *Sent:* Tuesday, 17 April 2012 8:30 PM
> *To:* concurrency-interest at cs.oswego.edu
> *Subject:* [concurrency-interest] ForkJoinPool seems lead to a worse
> latencythan traditional ExecutorServices
>
> Hi, all,
>
> I tried to use  the newest version of  ForkJoinPool from the cvs
> repository of jsr166y to replace the old  ExecutorService on our RPC
> project opensource at http://code.google.com/p/nfs-rpc/ .
>
> The modification is quite slight. Here is the diff
>
>  Index:
> nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.java
> ===================================================================
> ---
> nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.java (revision
> 0)
> +++
> nfs-rpc-common/src/main/java/code/google/nfs/rpc/NamedForkJoinThreadFactory.java (revision
> 0)
> @@ -0,0 +1,48 @@
> +package code.google.nfs.rpc;
> +/**
> + * nfs-rpc
> + *   Apache License
> + *
> + *   http://code.google.com/p/nfs-rpc (c) 2011
> + */
> +import java.util.concurrent.atomic.AtomicInteger;
> +
> +import code.google.nfs.rpc.jsr166y.ForkJoinPool;
> +import
> code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
> +import code.google.nfs.rpc.jsr166y.ForkJoinWorkerThread;
> +
> +/**
> + * Helper class to let user can monitor worker threads.
> + *
> + * @author <a href="mailto:coderplay at gmail.com">coderplay</a>
> + */
> +public class NamedForkJoinThreadFactory implements
> ForkJoinWorkerThreadFactory {
> +
> + static final AtomicInteger poolNumber = new AtomicInteger(1);
> +
> +    final AtomicInteger threadNumber = new AtomicInteger(1);
> +    final String namePrefix;
> +    final boolean isDaemon;
> +
> +    public NamedForkJoinThreadFactory() {
> +        this("pool");
> +    }
> +    public NamedForkJoinThreadFactory(String name) {
> +        this(name, false);
> +    }
> +    public NamedForkJoinThreadFactory(String preffix, boolean daemon) {
> +        namePrefix = preffix + "-" + poolNumber.getAndIncrement() +
> "-thread-";
> +        isDaemon = daemon;
> +    }
> +
> +    @Override
> +    public ForkJoinWorkerThread newThread(ForkJoinPool pool) {
> +        ForkJoinWorkerThread t =
> +
>  ForkJoinPool.defaultForkJoinWorkerThreadFactory.newThread(pool);
> +        t.setName(namePrefix + threadNumber.getAndIncrement());
> +        t.setDaemon(isDaemon);
> +        return t;
> +    }
> +
> +}
> +
> Index:
> nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmarkServer.java
> ===================================================================
> ---
> nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmarkServer.java (revision
> 120)
> +++
> nfs-rpc-common/src/main/java/code/google/nfs/rpc/benchmark/AbstractBenchmarkServer.java (working
> copy)
> @@ -8,12 +8,10 @@
>  import java.text.SimpleDateFormat;
>  import java.util.Date;
>  import java.util.concurrent.ExecutorService;
> -import java.util.concurrent.SynchronousQueue;
> -import java.util.concurrent.ThreadFactory;
> -import java.util.concurrent.ThreadPoolExecutor;
> -import java.util.concurrent.TimeUnit;
>
> -import code.google.nfs.rpc.NamedThreadFactory;
> +import code.google.nfs.rpc.NamedForkJoinThreadFactory;
> +import code.google.nfs.rpc.jsr166y.ForkJoinPool;
> +import
> code.google.nfs.rpc.jsr166y.ForkJoinPool.ForkJoinWorkerThreadFactory;
>  import code.google.nfs.rpc.protocol.PBDecoder;
>  import code.google.nfs.rpc.protocol.RPCProtocol;
>  import code.google.nfs.rpc.protocol.SimpleProcessorProtocol;
> @@ -66,9 +64,13 @@
>   });
>   server.registerProcessor(RPCProtocol.TYPE, "testservice", new
> BenchmarkTestServiceImpl(responseSize));
>   server.registerProcessor(RPCProtocol.TYPE, "testservicepb", new
> PBBenchmarkTestServiceImpl(responseSize));
> - ThreadFactory tf = new NamedThreadFactory("BUSINESSTHREADPOOL");
> - ExecutorService threadPool = new ThreadPoolExecutor(20, maxThreads,
> - 300, TimeUnit.SECONDS, new SynchronousQueue<Runnable>(), tf);
> + ForkJoinWorkerThreadFactory tf = new
> NamedForkJoinThreadFactory("BUSINESSTHREADPOOL");
> + ExecutorService threadPool = new ForkJoinPool(maxThreads, tf,
> +          new Thread.UncaughtExceptionHandler() {
> +              public void uncaughtException(Thread t, Throwable e){
> +                // do nothing;
> +              };
> +          }, true);
>   server.start(listenPort, threadPool);
>   }
>
>
> I did a benchmark (see
> http://code.google.com/p/nfs-rpc/wiki/HowToRunBenchmark ) with the hope
> of significant TPS improvments, but got a bad result cross to the purpose.
>  ForkJoinPool (avg response time 12 ms) seems lead to a worse latency than
> it did with traditional ExecutorService (avg response time 3ms).
>
> With ForkJoinPool:
>
>  ----------Benchmark Statistics--------------
>  Concurrents: 500
>  CodecType: 3
>  ClientNums: 1
>  RequestSize: 100 bytes
>  Runtime: 120 seconds
>  Benchmark Time: 81
>  Requests: 3740311 Success: 99% (3739274) Error: 0% (1037)
>  Avg TPS: 41374 Max TPS: 62881 Min TPS: 3333
>  Avg RT: 12ms
>  RT <= 0: 0% 1829/3740311
>  RT (0,1]: 1% 59989/3740311
>  RT (1,5]: 47% 1778386/3740311
>  RT (5,10]: 17% 655377/3740311
>  RT (10,50]: 32% 1204205/3740311
>  RT (50,100]: 0% 31479/3740311
>  RT (100,500]: 0% 546/3740311
>  RT (500,1000]: 0% 7463/3740311
>  RT > 1000: 0% 1037/3740311
>
>
> With traditional thread pool:
>  ----------Benchmark Statistics--------------
>  Concurrents: 500
>  CodecType: 3
>  ClientNums: 1
>  RequestSize: 100 bytes
>  Runtime: 120 seconds
>  Benchmark Time: 81
>  Requests: 12957281 Success: 100% (12957281) Error: 0% (0)
>  Avg TPS: 144261 Max TPS: 183390 Min TPS: 81526
>  Avg RT: 3ms
>  RT <= 0: 0% 3997/12957281
>  RT (0,1]: 4% 592905/12957281
>  RT (1,5]: 95% 12312500/12957281
>  RT (5,10]: 0% 19280/12957281
>  RT (10,50]: 0% 92/12957281
>  RT (50,100]: 0% 507/12957281
>  RT (100,500]: 0% 26500/12957281
>  RT (500,1000]: 0% 1500/12957281
>  RT > 1000: 0% 0/12957281
>
>
> I ran this benchmark on two 16 cores Westmere machines ( Xeon E5620 8 core
> HT) with the same configuration below of the two tests.
>
> 1. JDK version: Oracle 1.7.0_03 (hotspot)
>
> 2. client side JVM options:
> -Xms4g -Xmx4g -Xmn1g -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:gc.log -Dwrite.statistics=true -XX:+UseParallelGC
> -XX:+UseCondCardMark -XX:-UseBiasedLocking
> -Djava.ext.dirs=/home/min/nfs-rpc
> code.google.nfs.rpc.netty.benchmark.NettySimpleBenchmarkClient 10.232.98.96
> 8888 500 1000 3 100 120 1
>
> 3. server side JVM options:
> -Xms2g -Xmx2g -Xmn500m -XX:+UseParallelGC -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -Xloggc:gc.log -XX:+UseCondCardMark
> -XX:-UseBiasedLocking -Djava.ext.dirs=/home/min/nfs-rpc
> code.google.nfs.rpc.netty.benchmark.NettyBenchmarkServer 8888 100 100
>
> Low context switches, about 8000 per second, is also observed with
> ForkJoinPool against to which with the old threadpool it's about 150000.
> Benchmarks under Oracle JDK 1.6 is also did by me with similar results.
>
> Is there anyone kindly explain the reason why leading to those describe
> above for me ?
>
> Thanks,
> Min
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>
>


-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20120417/292a4d1f/attachment.html>


More information about the Concurrency-interest mailing list