[concurrency-interest] Concurrent indexed queue

Aleksey Shipilev aleksey.shipilev at gmail.com
Wed Aug 24 11:28:23 EDT 2011

Hi Rohit,

Are you looking for efficient CircularBuffer implementation then? Having one
of those, you can then have per-instrument CircularBuffer and poll each by
"key", which will provide essential indexing.


On Wed, Aug 24, 2011 at 6:58 PM, Rohit Reja <rreja2000 at yahoo.com> wrote:

> Hi,
> I am working on a project where I need advise on designing following
> functionality.
> Producer threads read quotes on a set of instruments (5000 instruments)
> from a message bus
> Which needs to be routed to a downstream system which can consume messages
> at a slower rate
> Than producers are producing messages. The catch is that only the latest
> quotes need to be published
> To the downstream system. So we can miss out on stale quotes. So
> essentially we dont need to slow down the producers as the event queue is
> bounded by the number of instruments.
> I want to keep the design flexible as to allow multiple producers and
> multiple consumers.
> So in short I need a concurrent indexed queue where i can update the
> entries in place. This is a latency sensitive application so we would want
> latency to be as low as possible.
> Please advise me for suitable data structures that can be of use.
> Regards,
> Rohit
>  ------------------------------
> * From: * Jeff Hain <jeffhain at rocketmail.com>;
> * To: * Doug Lea <dl at cs.oswego.edu>;
> * Cc: * <concurrency-interest at cs.oswego.edu>;
> * Subject: * [concurrency-interest] Re : Re : concurrent counter :
> incrementAndGet
> * Sent: * Sat, Aug 20, 2011 10:17:54 AM
>   I hand-emulated ">" replies style, which is handier :)
> ------------------------------
> *>De :* Doug Lea <dl at cs.oswego.edu>
> *>À :* concurrency-interest at cs.oswego.edu
> *>Envoyé le :* Jeu 18 août 2011, 22h 05min 59s
> *>Objet :* Re: [concurrency-interest] Re : concurrent counter :
> incrementAndGet
> >
> >On 08/14/11 07:03, Jeff Hain wrote:
> >>
> >> That could work well for some cases indeed, but I'm working on a sort
> >> of Disruptor, using the counter with modulo to pick up slots to write to
> in
> >> a cyclic array, monotonically, or non-monotonically but for a short time
> >> (for readers not to be blocked on a not-yet-written slot), and any
> writer
> >> can stop to work anytime; in this case I don't see how that could apply
> >> easily.
> >
> >I don't think "non-monotonically but for a short time" makes
> >this easier, since any violations are likely to be unbounded.
>    Violations are indeed unbounded, which could make for example all
> producers
> claim wrapping sequences (on entries which could not be written for the
> time),
> and let consumers stuck on not-yet-written entries (*).
>    Though, these violations are dealt with, as follows: when a consumer
> encounters
> a writable entry (for "current round"), and that the counter already
> provided a higher
> sequence to a producer (hence there will be something to eat further), it
> tries to
> CAS-it-up to being writable but for "next round" (i.e. sequence += buffer
> length),
> so that consumers can pass on it and not be stuck (a producer can write it
> just after,
> it doesn't hurt, consumers will still just pass on it). In the process,
> consumer also depletes
> the counter of any lower sequence, to lower violations, and so that no
> producer
> subsequently tries to write on a CASed-up entry (if one does, i.e. entry
> doesn't
> have the expected sequence, it just picks another sequence).
>    At the time, on a core i7 X 980, when doing "no work" benches (i.e.
> consumers
> do nothing), using a concurrent counter instead of AtomicLong is just a bit
> slower if
> consumers are going faster than a few producers (due to the CASing-up going
> on I guess),
> but if producers go faster, i.e. if consumers have something to eat most of
> the time,
> I've seen up to 30 percent improvement with 8 producers and 1 consumer, and
> about
> 20 percent with anywhere between 32 and 128 producers.
>    (My code is still a mess of comments and obsolete code, or dead
> debugging code,
> so I can't attach it, but there's not much more in it than what I said.)
>    (*) The "Disruptor" I use is "unicast", and producers and consumers
> don't spy
> on each other (which doesn't scale well), but only look at entries sequence
> and status
> (writable,being written,readable,being read), both stored in a same long.
> >You might be able to live with thread local random number generators,
> >so that in the long run on average you are balanced?
>    I tried to use random (Marsaglia XorShift) in the concurrent counter,
> but it was actually slower than just doing "++index" for the next cell to
> CAS on (it's an array of sub-counters, each incremented by the number
> of sub-counters each time). If CAS fails, I try another cell. Also, I
> actually
> don't do "++index", but "index = index + 1 + nbrOfFailedCAS" (with mask),
> which seems to help to get away from contention.
>    Also, as I said, balancing is dealt with, but the counter also has
> a method that ensures monotonicity (for the thread-local, or
> non-thread-safe-view-instance-local, data structure that holds previous
> index and highest returned value). It is a bit slower, but it tends
> to lower unbalancing, since when it gets a value lower than the
> highest previously returned one, it tries to CAS again on the same cell,
> making it catch-up.
> >Otherwise, you have a version of the Counting problem described
> >in Herlihy & Shavit, for which all known scalable solutions
> >are expensive, complicated, and difficult to package as
> >j.u.c components. Although there are some special cases like
> >SNZI for detecting when counts hit the particular value of zero.
>    Thanks, I didn't know about these. I've tried Shavit & others's
> flat-combining method though for the counter, but it was too slow.
> >
> >(This seems to be the main bottleneck in Disruptor-like designs.)
>    Yes, that's why I started to focus on it. I think it's the only part of
> this disruptor
> that is potentially proportionnal-or-worse to the number of threads (one
> could think
> of ring buffer scanning by consumers too, but it should be possible to use
> modulo so that each entry doesn't get scanned by all consumers).
> >
> >-Doug
> -Jeff
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20110824/b44958db/attachment-0001.html>

More information about the Concurrency-interest mailing list