[concurrency-interest] Re : Re : concurrent counter : incrementAndGet

Jeff Hain jeffhain at rocketmail.com
Sat Aug 20 06:17:54 EDT 2011

I hand-emulated ">" replies style, which is handier :)

>De : Doug Lea <dl at cs.oswego.edu>
>À : concurrency-interest at cs.oswego.edu
>Envoyé le : Jeu 18 août 2011, 22h 05min 59s
>Objet : Re: [concurrency-interest] Re : concurrent counter :  incrementAndGet
>On 08/14/11 07:03, Jeff Hain wrote:
>> That could work well for some cases indeed, but I'm working on a sort
>> of Disruptor, using the counter with modulo to pick up slots to write to in
>> a cyclic array, monotonically, or non-monotonically but for a short time
>> (for readers not to be blocked on a not-yet-written slot), and any writer
>> can stop to work anytime; in this case I don't see how that could apply
>>  easily.
>I don't think "non-monotonically but for a short time" makes
>this easier, since any violations are likely to be unbounded.

   Violations are indeed unbounded, which could make for example all producers
claim wrapping sequences (on entries which could not be written for the time),
and let consumers stuck on not-yet-written entries (*).
   Though, these violations are dealt with, as follows: when a consumer 
a writable entry (for "current round"), and that the counter already provided a 
sequence to a producer (hence there will be something to eat further), it tries 
CAS-it-up to being writable but for "next round" (i.e. sequence += buffer 
so that consumers can pass on it and not be stuck (a producer can write it just 
it doesn't hurt, consumers will still just pass on it). In the process, consumer 
also depletes
the counter of any lower  sequence, to lower violations, and so that no producer
subsequently tries to write on a CASed-up entry (if one does, i.e. entry doesn't
have the expected sequence, it just picks another sequence).
   At the time, on a core i7 X 980, when doing "no work" benches (i.e. consumers
do nothing), using a concurrent counter instead of AtomicLong is just a bit 
slower if
consumers are going faster than a few producers (due to the CASing-up going on I 
but if producers go faster, i.e. if consumers have something to eat most of the 
I've seen up to 30 percent improvement with 8 producers and 1 consumer, and 
20 percent with anywhere between 32 and 128 producers.
   (My code is still a mess of comments and obsolete code, or dead debugging 
so I can't attach it, but there's not much more in it than what I said.)

   (*) The "Disruptor" I use is "unicast", and producers and  consumers don't 
on each other (which doesn't scale well), but only look at entries sequence and 
(writable,being written,readable,being read), both stored in a same long.

>You might be able to live with thread local random number generators,
>so that in the long run on average you are balanced?

   I tried to use random (Marsaglia XorShift) in the concurrent counter,
but it was actually slower than just doing "++index" for the next cell to
CAS on (it's an array of sub-counters, each incremented by the number
of sub-counters each time). If CAS fails, I try another cell. Also, I actually
don't do "++index", but "index = index + 1 + nbrOfFailedCAS" (with mask),
which seems to help to get away from contention.
   Also, as I said, balancing is dealt with, but the counter also has
a method that ensures monotonicity (for the thread-local, or
non-thread-safe-view-instance-local, data structure that holds previous
index and highest returned value). It is a bit slower, but it tends
to lower unbalancing, since when it gets a value lower than the
highest previously  returned one, it tries to CAS again on the same cell,
making it catch-up.

>Otherwise, you have a version of the Counting problem described
>in Herlihy & Shavit, for which all known scalable solutions
>are expensive, complicated, and difficult to package as
>j.u.c components. Although there are some special cases like
>SNZI for detecting when counts hit the particular value of zero.

   Thanks, I didn't know about these. I've tried Shavit & others's
flat-combining method though for the counter, but it was too slow.

>(This seems to be the main bottleneck in Disruptor-like designs.)

   Yes, that's why I started to focus on it. I think it's the only part of this 
that is potentially proportionnal-or-worse to the number of threads (one could 
of ring buffer scanning by consumers too, but it should be possible to use
modulo so that each entry doesn't get  scanned by all consumers).



Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20110820/107bc2cd/attachment-0001.html>

More information about the Concurrency-interest mailing list