[concurrency-interest] Suspecting a problem in recent jdk-9 builds
dl at cs.oswego.edu
Wed Dec 28 07:30:33 EST 2016
On 12/28/2016 04:30 AM, Antoine Tissier wrote:
> We have been running benchmarks for our in-memory analytics software
> ActivePivot on a M6.32 machine (Solaris Sparc, 8 TB RAM, 2304 logical
> cores (288 physical cores)).
> Our benchmarks involve high parallelism along with many queries divided
> in a high number of tasks (CountedCompleters) in the ForkJoinPool. With
> build 145 of jdk-9, some tasks are not executed, causing larger
> completion problems. However, with the earlier build 111, the problem
> does not occur.
> On a smaller Linux machine (Linux amd 64, 64 logical cores (32 physical
> cores), 512 GB RAM) but with a similar setup, the problem was not
> The problem seems to arise when a large number of completers (>20 000)
> are involved: forking tasks works well but when submitting tasks to a
> new pool, it seems that their compute method is sometimes not called.
> We indeed log every call to ForkJoinPool.submit, as well as everytime a
> completer enters its compute method, and clearly see that once in a
> while, the task is never computed after having been submitted. We let
> the system run for an additional hour, and there was no more progress
> even though the system was idle. Thread dumps did not show any suspect
> activity (all worker threads were idle).
> We tried to reproduce the problem with a similar but more simple test,
> but it was not successful.
> Are you aware of any concurrency/task completion problems in the more
> recent builds of jdk-9 ?
The only changes in any relevant j.u.c classes were to incorporate
VarHandles in June. I believe these were tested on Sparcs, but not
> Are there any additional tests that we could run in order to diagnose
> this issue ?
It's not easy to diagnose a problem that seems to be specific
to a machine and program we don't have.
Some initial checks would be to try different VM and GC settings,
especially -XX:+UseParallelGC, (vs default UseG1GC) and
-XX:-UseBiasedLocking). Also, if using commonPool, try
for different values of n.
These would help rule out some kinds of problems.
More information about the Concurrency-interest