[concurrency-interest] Single producer, single consumer: unexpected delays for producer

Daniel Harvey dharvey at tachyoncm.com
Mon Aug 11 14:23:58 EDT 2008


Jed,
thanks for the code snippet. I tried implementing this instead of the  
LinkedBlockingQueue, but found the performance to be slower under my  
particular scenario (about 50% slower) for some reason.
Next step is to try different JVMs...
-Dan

On Aug 11, 2008, at 3:34 AM, Jed Wesley-Smith wrote:

> Daniel, I have been doing some performance tests on SRSW algorithms  
> recently and found that LinkedBlockingQueue can degrade  
> surprisingly (although as David says, if the size is small then  
> threads will collide around the same nodes causing quite a lot of  
> work). I found something like the following to be a lot quicker in  
> my particular tests (single writer creating values, single reader  
> retrieving them, neither really doing much else:
>
> class SRSWBlockingQueue {
>    final BooleanLatch latch = new BooleanLatch();
>    final AtomicReference<List<Integer>> ref = new  
> AtomicReference<List<Integer>>();
>
>    public void put(final Integer i) {
>        List<Integer> list;
>        do {
>            list = ref.getAndSet(null);
>            if (list == null) {
>                list = new ArrayList<Integer>();
>            }
>            list.add(i);
>        }
>        while (!ref.compareAndSet(null, list));
>        latch.release();
>    }
>
>    public List<Integer> drain() throws InterruptedException {
>        List<Integer> list;
>        do {
>            latch.await();
>            list = ref.getAndSet(null);
>        }
>        while (list == null);
>        return Collections.unmodifiableList(list);
>    }
> }
>
> In my tests, the above was 60% faster than LBQ on Java6 (MacOSX)  
> with high contention. The above example uses a class called  
> BooleanLatch that I have discussed in previous threads on this list  
> (basically a reusable single count CountDownLatch), but I found  
> that using Lock/Condition wasn't a hell of a lot slower,  
> particularly under high contention, and the Lock version might have  
> a couple of advantages in this particular case as you can drain  
> while holding the lock.
>
> Anyway, I found it is worth batching your exchange if you can to  
> reduce the contention. It is also worth considering creating your  
> own concurrent structure for SRSW as the java.util.concurrent  
> classes are written to be correct under MRMW which is obviously  
> more complex. The caveat is of course that the java.util.concurrent  
> classes *do actually work* and writing your own concurrent classes  
> is not trivial.
>
> cheers,
> jed.
>
> Daniel Harvey wrote:
>> David,
>> I have timed the entire onMessage() processing for the producer  
>> (which is called very often) and also just the time the producer  
>> spends in calling queue.offer() (via sendThread.send, called about  
>> 1/5 as often). What is so odd to me is that 90% of the total time  
>> spent in onMessage() comes from time spent in queue.offer()  
>> despite the fact that this is only called for 20% of the messages.  
>> If I have the sendThread skip the socketChannel.write() command,  
>> than the situation changes greatly: time spent in queue.offer  
>> drops by a literally a factor of 6 or so.
>>
>> I haven't tracked the queue size over time, but do know that it  
>> isn't filling up because I'm not seeing any error messages from  
>> queue.offer() returning false. As long as the queue is below its  
>> fixed max size, should its actual size affect the amount of work  
>> needed to perform a queue insertion?
>>
>> Anyway, I will certainly try Sun jdk, and look into whether CentOS  
>> provides the POSIX api's.
>>
>> I'll also give Carfield's suggestion a try (thanks), though in  
>> past experiments I have done I haven't found the drainTo to be  
>> _that_ much more efficient that repeated take()'s.
>>
>> -Dan
>>
>> On Aug 9, 2008, at 12:45 AM, David Holmes wrote:
>>
>>> Dan,
>>>
>>> Have you actually just timed the queue operations? You indicated  
>>> that if the
>>> consumer just did take() in a loop then there was a 6x speed-up  
>>> in the
>>> producer. That would seem to indicate that the queue itself is  
>>> not the
>>> problem. It could just be an unfortunate harmonic where the  
>>> producer and
>>> consumer always collide at the queue, but even so the delay  
>>> shouldn't result
>>> in a 6x slowdown. You might try tracking the queue size to see if  
>>> grows and
>>> shrinks as expected or whether it oscillates around being empty.
>>>
>>> AS for whether system calls are involved ... probably, but I  
>>> don't know how
>>> anything in JRockit is implemented so I can't say for sure. I'm  
>>> assuming it
>>> will use both of your processors.
>>>
>>> Can you download JDK 6 and try that for comparison? (You might  
>>> also try an
>>> evaluation download of Sun Java Real-time System if CentOS  
>>> provides the
>>> real-time POSIX API's - and presuming it's actually a real-time  
>>> linux. :) )
>>>
>>> Cheers,
>>> David Holmes
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest



More information about the Concurrency-interest mailing list