[concurrency-interest] Concurrent tries as a cache
forax at univ-mlv.fr
Mon Dec 12 05:43:11 EST 2011
On 12/11/2011 07:41 PM, Rémi Forax wrote:
> On 12/11/2011 04:20 PM, Michael Barker wrote:
>> Hi Remi,
>> It would be worthwhile benchmarking your implementation against
>> ConcurrentHashMap. I did something similar a while back, creating a
>> ConcurrentMap using a Bagwell-style tree. The quantity of my
>> implementation notwithstanding, I was unable to bring the performance
>> up above ConcurrentHashMap. Profiling was inconclusive, but suggested
>> that a combination of memory throughput demands and garbage generated
>> by the array creation and copying when nodes were added was the main
>> performance cost. It outweighed the costs of the volatile/CAS
>> Details are here:
>> Michael Barker.,
> Hi Michael,
> I don't think my implementation is better/worst than ConcurrentHashMap
> because it's not a replacement of ConcurrentHashMap :)
> I need a data-structure that acts as a cache with no eviction (never !)
> and where calls to lookup/get outnumber by a wide margin calls to
> add/put (let say 10000/1).
> The way I test my implementations is to replace all method calls of
> the DaCapo
> benchmark by an invokedynamic that does the multi-dispatch
> (in fact some aren't replaced because the algorithm isn't finished).
> So I know if a data structure is better than the other
> for my use case, not in general.
> Now, reading your blog confirms my intuition,
> if there is a lot of read and few write, a concurrent tries is perhaps
> Also, about the possible virtual calls because of the different kinds
> of Node,
> there is no ListNode in my implementation, so there is only two
> of Node that fits in the bimorphic inlining caches used by the VM
> so the code is inlined :)
I've taken a look to the code written by Aleksander Prokopec (thanks
in Scala and he uses TCO (tail call optimization) done by the scala compiler
to avoid virtual calls (in fact there is no call anymore).
I've done exactly the same thing in Java  (so it's more readable :)
and it's clearly faster. Perhaps you should try the same optimization
on your tries.
I have also tried to merge the entry node and the array node in one
but perf was not great, I think it's because the memory consumption was
too big to fit in on level of the hardware cache (or cache line).
More information about the Concurrency-interest