[concurrency-interest] ForkJoinPool with custom number of WorkQueues

Francesco Nigro nigro.fra at gmail.com
Mon Nov 12 09:19:03 EST 2018


Answers inline

There are now only a few j.u.c classes that include
> paths allowing a non-trivial amount of spinning, and some of these might
> be subject to reconsideration in light of Loom, in which spinning is
> rarely a good idea


Good point: most busy spin approaches (or Thread::onSpinWait patterns)
should be avoided just for this reason:
to not count the recent issues re Intel and the pause instrinsic IIRC.

I'm not a big fan of forcing users to make such decisions, since they
> can/will encounter the same issues as algorithmic solutions do, as in
> Carl's situation below.


That's true for any fine tuning related to specific performance issue I
believe.
That's why I propose to not have this tuning in the hands of the "average"
(2 be defined) user, but instead to default
on the park approach and just via a protected constructor accessible only
via inheritance the full fat feature
(ie customizable IdleStrategy/WaitStrategy et similia). Any defensive
approach is wellcome here :)
I perfectly understand that there are just few devs in the world that need
it, but given that I'm one of
them I can't avoid to propose it respecting your need to not complicate the
life to the other 99,9999% of users,
but I do understand if it won't seems safe enough.


Il giorno dom 11 nov 2018 alle ore 15:25 Doug Lea via Concurrency-interest <
concurrency-interest at cs.oswego.edu> ha scritto:

>
> I agree that something should be done to improve performance in cases
> where the optimal steady state is to engage only a subset of available
> worker threads. I'll explore other options (which has been on todo-list
> for too long). None of those we've tried so far are very good:
>
>
> On 11/9/18 2:52 AM, Francesco Nigro via Concurrency-interest wrote:
> > In the past implementations of the FJ pool there was a SPIN static
> > variable used for this purpose that where quite good (with
> > parallelism<available cores) to not park immediately the executor
> > threads where is nothing to do.
>
> ... with the disadvantage of unacceptably high CPU wastage on machines
> running near CPU saturation, as we discovered only after releasing, and
> so later removed. There are now only a few j.u.c classes that include
> paths allowing a non-trivial amount of spinning, and some of these might
> be subject to reconsideration in light of Loom, in which spinning is
> rarely a good idea.
>
> >
> > I have the same problem of Carl and I have found a similar "solution":
> > reduce the parallelism hoping to get the queues more busy, but it would
> > in incour in an higher contention on offer side.
>
> When steady state has little variance in task execution rates, reducing
> parallelism *is* the right solution. But has the disadvantage of poorer
> throughput when a lot of tasks suddenly appear.
>
> >
> > It wouldn't make sense to inject some WaitStrategy to allow the user to
> > choose what to do before parking (if parking)?
>
> I'm not a big fan of forcing users to make such decisions, since they
> can/will encounter the same issues as algorithmic solutions do, as in
> Carl's situation below.
>
> >
> > Il giorno ven 9 nov 2018, 00:18 Carl Mastrangelo via
> > Concurrency-interest <concurrency-interest at cs.oswego.edu
> > <mailto:concurrency-interest at cs.oswego.edu>> ha scritto:
> >
> >     Hi,
> >
> >     I am using ForkJoinPool as a means to avoid lock contention for
> >     submitting tasks to an executor.  FJP is much faster than before,
> >     but has some unexpected slowdown when there is not enough work.   In
> >     particular, A significant amount of time is spent waiting parking
> >     and unparking threads when there no work to do.  When there is no
> >     active work, it seems each worker scans the other work queues
> >     looking for work before going to sleep.
> >
> >     In my program I have a parallelism of 64, because when the work load
> >     is high, each of the threads can be active.  However, when work load
> >     is low, the workers spend too much time looking for more work.
>
> Besides keeping too many threads alive, the main performance issue here
> is reduced locality because related tasks are taken by different
> threads. So ...
>
> >
> >     One way to fix this (I think) is to lower the number of worker
> >     queues, but keep the same number of workers.   In my case, having 32
> >     or 16 queues rather than having exactly 64 might help, but I have no
> >     way of testing it out.   Has this ever been brought up, or is it
> >     possible for me to easily patch FJP to see if it would help?
>
> ... I don't think this will be a dramatic improvement, but if you are
> interested in experimenting, you could try building a jsr166.jar after
> changing ForkJoinPool line 703 from:
>     static final int SQMASK       = 0x007e;   // max 64 (even) slots
>     static final int SQMASK       = 0x001e;   // max 16 (even) slots
>
> -Doug
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20181112/252fb22b/attachment.html>


More information about the Concurrency-interest mailing list