[concurrency-interest] strange performance issue with ArrayBlockingQueue on Linux

Brian S O'Neill bronee at gmail.com
Thu Jul 20 00:50:42 EDT 2017

You also didn't mention CPU core count. I ran the test on two machines, 
with Java 8 and with Java 9. With Java 9 I explicitly selected the 
parallel GC, because the G1 collector in Java 9 is a CPU pig that slows 
everything down.

Machine 1: Windows 10, Intel Xeon E31245 3.3GHz, 8 logical cores
Machine 2: Linux 4.10, AMD Ryzen 7 1700 3.0Ghz, 16 logical cores

Machine 1, Java 8: 9.2 seconds
Machine 1, Java 9: 9.1 seconds
Machine 2, Java 8: 16.9 seconds
Machine 2, Java 9: 12.1 seconds

At first blush it looks like the problem is Linux, but I don't have a 
proper apples-to-apples comparison. I noticed that with Java 8, the 
number of context switches per second was about 2.5 million, and with 
Java 9 it was about 2.3 million. Either way, this is really high. On 
Windows, the context switch rate was about 1.9 million per second.

I also see a lot of variation between test runs, and when I run the code 
in a loop, the performance (on Windows) varies from 9 to 14 seconds per 
run. On Linux, I see 11 to 21 seconds when running in a loop. I also 
added an explicit System.gc between each test run.

When I run with -Xprof, I see the threads spending most of their time in 
Unsafe.unpark and Unsafe.park. I've never understood why unpark always 
comes up as a performance bottleneck. What does it do that's so expensive?

On 2017-07-19 09:11 PM, Matthews, Graham wrote:
> ​Ah yes, sorry I should have said we are working in Java 8.
> graham
> ------
> Graham MATTHEWS, Ph.D.
> Assistant Professor
> Department of Computer Science |  California Lutheran University
> 60 W. Olsen Rd.  |  Thousand Oaks, CA 91360
> gmatthew at callutheran.edu  | Office location: D19
> ------------------------------------------------------------------------
> *From:* Martin Buchholz <martinrb at google.com>
> *Sent:* Wednesday, July 19, 2017 8:57 PM
> *To:* Matthews, Graham
> *Cc:* concurrency-interest at cs.oswego.edu
> *Subject:* Re: [concurrency-interest] strange performance issue with 
> ArrayBlockingQueue on Linux
> One obvious variable is the version of ABQ you are using.  It was 
> modified in jdk9 and you should be doing all performance work with 
> latest jdk9 binaries e.g. from http://jdk.java.net/9/
> On Wed, Jul 19, 2017 at 8:41 PM, Matthews, Graham 
> <gmatthew at callutheran.edu <mailto:gmatthew at callutheran.edu>> wrote:
>     ​[apologies in advance if this is the wrong place to discuss this
>     issue. also apologies for the long setup -- I promise this is about
>     ArrayBlockingQueue, Java, and a performance issue with them]
>     My student and I have been benchmarking various ways to run cross
>     data source joins. The basic setup is as follows:
>                 J2
>                 /  \
>                /    \
>              R3  J1
>                    /  \
>                   /    \
>                 R2  R1
>     R1, R2 and R3 are called receive operators as they receive data from
>     the databases. J1 and J2 are inner join operators joining values
>     from R2 with values from R1 and then the result with values from R3.
>     One way to run these queries is the classic producers and consumers
>     approach -- so each operator is a Java thread, and there are
>     bounded queues (ArrayBlockingQueues) between the operators. So the
>     above example has 5 threads and 4 ArrayBlockingQueues. So here we
>     interleave I/O with computation.
>     Another way to run this is to first load R3 into J2 in parallel with
>     loading R2 into J1 (the loads build hash maps in J1 and J2), and
>     then have a thread pool with N workers (where N is the number of
>     cores), where each worker takes a value from R1 and joins it "up the
>     spine" of the tree -- so matches against the hash map in J1 and then
>     J2. This approach (the phased approach) therefore has an I/O phase,
>     followed by a computation phase.
>     On OS X El Capitan (10.11.6), the queueing approach and the phased
>     approach perform roughly equally (more variation in the run times
>     for the queueing approach, but performance within 10% of each
>     other). On Linux (Ubuntu 16.04 LTS)the queueing approach is almost
>     three times slower than the phased approach!
>     On top of that the queueing approach runs over twice as fast on OS X
>     as it is does on Linux, despite the fact that I am running OS X on
>     old hardware (mid 2012 Macbook Pro, 3rd Gen Intel Core i5 3210M) and
>     Linux on relatively new hardware (6th Gen Intel Core i7-5500U).
>     We have boiled the problem down to code that doesn't involve
>     databases -- just threads and ArrayBlockingQueues. Attached are
>     5 Java files which "approximate" the queueing approach by having
>     each receive thread simply output numbers (rather than get values
>     from a database) to an ArrayBlockingQueue, and
>     each Join thread simply read its two input queues, adds the values
>     together, and then output the sum to the output queue.
>     Onthe same hardware above this code runs in 1 minute 30 secs on OS
>     X, and 2 minutes 13 secs on Linux!
>     The code is very simple -- just a bunch of threads and a bunch of
>     ArrayBlockingQueues. So I am at a loss to explain the size-able
>     performance differences noted above.
>     Again apologies if this is not the right forum to discuss this issue.
>     graham
>     ​
>     ------
>     Graham MATTHEWS, Ph.D.
>     Assistant Professor
>     Department of Computer Science |  California Lutheran University
>     60 W. Olsen Rd.  |  Thousand Oaks, CA 91360
>     gmatthew at callutheran.edu <mailto:gmatthew at callutheran.edu>  | Office
>     location: D19
>     _______________________________________________
>     Concurrency-interest mailing list
>     Concurrency-interest at cs.oswego.edu
>     <mailto:Concurrency-interest at cs.oswego.edu>
>     http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>     <http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

More information about the Concurrency-interest mailing list