[concurrency-interest] Single producer, single consumer: unexpected delays for producer

Jed Wesley-Smith jed at atlassian.com
Mon Aug 11 03:34:26 EDT 2008


Daniel, I have been doing some performance tests on SRSW algorithms 
recently and found that LinkedBlockingQueue can degrade surprisingly 
(although as David says, if the size is small then threads will collide 
around the same nodes causing quite a lot of work). I found something 
like the following to be a lot quicker in my particular tests (single 
writer creating values, single reader retrieving them, neither really 
doing much else:

class SRSWBlockingQueue {
    final BooleanLatch latch = new BooleanLatch();
    final AtomicReference<List<Integer>> ref = new 
AtomicReference<List<Integer>>();

    public void put(final Integer i) {
        List<Integer> list;
        do {
            list = ref.getAndSet(null);
            if (list == null) {
                list = new ArrayList<Integer>();
            }
            list.add(i);
        }
        while (!ref.compareAndSet(null, list));
        latch.release();
    }

    public List<Integer> drain() throws InterruptedException {
        List<Integer> list;
        do {
            latch.await();
            list = ref.getAndSet(null);
        }
        while (list == null);
        return Collections.unmodifiableList(list);
    }
}

In my tests, the above was 60% faster than LBQ on Java6 (MacOSX) with 
high contention. The above example uses a class called BooleanLatch that 
I have discussed in previous threads on this list (basically a reusable 
single count CountDownLatch), but I found that using Lock/Condition 
wasn't a hell of a lot slower, particularly under high contention, and 
the Lock version might have a couple of advantages in this particular 
case as you can drain while holding the lock.

Anyway, I found it is worth batching your exchange if you can to reduce 
the contention. It is also worth considering creating your own 
concurrent structure for SRSW as the java.util.concurrent classes are 
written to be correct under MRMW which is obviously more complex. The 
caveat is of course that the java.util.concurrent classes *do actually 
work* and writing your own concurrent classes is not trivial.

cheers,
jed.

Daniel Harvey wrote:
> David,
> I have timed the entire onMessage() processing for the producer (which 
> is called very often) and also just the time the producer spends in 
> calling queue.offer() (via sendThread.send, called about 1/5 as 
> often). What is so odd to me is that 90% of the total time spent in 
> onMessage() comes from time spent in queue.offer() despite the fact 
> that this is only called for 20% of the messages. If I have the 
> sendThread skip the socketChannel.write() command, than the situation 
> changes greatly: time spent in queue.offer drops by a literally a 
> factor of 6 or so.
>
> I haven't tracked the queue size over time, but do know that it isn't 
> filling up because I'm not seeing any error messages from 
> queue.offer() returning false. As long as the queue is below its fixed 
> max size, should its actual size affect the amount of work needed to 
> perform a queue insertion?
>
> Anyway, I will certainly try Sun jdk, and look into whether CentOS 
> provides the POSIX api's.
>
> I'll also give Carfield's suggestion a try (thanks), though in past 
> experiments I have done I haven't found the drainTo to be _that_ much 
> more efficient that repeated take()'s.
>
> -Dan
>
> On Aug 9, 2008, at 12:45 AM, David Holmes wrote:
>
>> Dan,
>>
>> Have you actually just timed the queue operations? You indicated that 
>> if the
>> consumer just did take() in a loop then there was a 6x speed-up in the
>> producer. That would seem to indicate that the queue itself is not the
>> problem. It could just be an unfortunate harmonic where the producer and
>> consumer always collide at the queue, but even so the delay shouldn't 
>> result
>> in a 6x slowdown. You might try tracking the queue size to see if 
>> grows and
>> shrinks as expected or whether it oscillates around being empty.
>>
>> AS for whether system calls are involved ... probably, but I don't 
>> know how
>> anything in JRockit is implemented so I can't say for sure. I'm 
>> assuming it
>> will use both of your processors.
>>
>> Can you download JDK 6 and try that for comparison? (You might also 
>> try an
>> evaluation download of Sun Java Real-time System if CentOS provides the
>> real-time POSIX API's - and presuming it's actually a real-time 
>> linux. :) )
>>
>> Cheers,
>> David Holmes



More information about the Concurrency-interest mailing list