[concurrency-interest] A concurrent, linked HashMap impl

Manik Surtani manik at jboss.org
Wed May 27 06:59:23 EDT 2009

On 27 May 2009, at 01:59, Ben Manes wrote:

> Similarly, I have a CLHM implementation.  However I didn't base it  
> off of an existing algorithm as this would not be very educational  
> and my goal was purely for fun and to learn something.  Instead I've  
> slowly been maturing it as I get a better grasp of how to write  
> complex, concurrent algorithms.  I expect to make it lock-free in  
> the next iteration where I'll be trying to understand how to get  
> concurrent operations to work together to an end state rather than  
> relying on mutual exclusion to block operations that could corrupt  
> the shared state.  So my approach is much more oriented towards  
> personal growth than pragmatically implementing an algorithm I don't  
> quite grok.
> > pure LRU because list-based approaches force a sequential  
> bottlenecks at the ends of the lists.
> For this exact reason I've adopted is the Second Chance algorithm.   
> The efficiency is similar, but worse, than a pure LRU yet provides  
> the concurrency characteristics of a FIFO.  This allows reads to be  
> as cheap as a CHM, which is the 80% usage of a cache.
> > Additionally, if you trigger LRU-based removal when hitting a size  
> threshold, then you end up calling size() a lot, which is also not  
> cheap.
> This is why I maintain a the length of the list as a separate  
> variable (AtomicInteger).  The clients are made aware that the size  
> of the cache may temporarily fluctuate above the "maximum capacity"  
> and the size is slightly more variable.  Due to this approach the  
> length variable could, of course, temporarily be negative.  In that  
> case I simply return zero.  Its an acceptable trade-off for the use- 
> case of a cache.

This is useful if there isn't a stringent constraint on the accuracy  
of size().  Something to consider, good idea.

> > The diversity reflects the fact that there are a lot of tradeoffs  
> in hash table design.
> Whether to use a hash table is, by itself, a trade-off in usage.   
> For example, on the Ehcache forum a user reported a performance  
> problem with a cache of 60M entries.  With a 32-bit hashcode, there  
> is a 99% probability of a collision at 200k entries.  With CHM's  
> maximum of 2^16 segments, the average length of the link chain is  
> 915 entries.  The performance problem was most likely due to  
> traversing this list on every operation.  While I would argue that  
> 60M entries is an abusive use of the data structure, it shows that  
> CHM is not always the appropriate data store.  If instead the data  
> was stored in a ConcurrentSkipListMap then the height (# of  
> comparisons) is a mere 26.  There are other ways to solve it, but it  
> shows nicely how you must chose your data structures wisely.
> On a side-note, I would argue two points in design.  The first is  
> that for a JDK version this should be based off of Doug's  
> ConcurrentLinkedDeque.

I need to support efficient remove() as well.  And traversing the  
linked entry list to locate the node to be removed is O(n).  In my  
impl I locate the entry to be removed using the hash table first, and  
then unlink.

>   The second is that from an application's architecture the need for  
> a CLHM indicates a fundamental flaw in its design.  A JVM-level  
> cache under such heavy load to require such a map is a serious flaw  
> in the usage pattern.

Why?  Maintaining a bounded map is not uncommon, and deciding which  
entries to evict from such a bounded map usually requires some sort of  
order being maintained.

> Instead the topology of caches and execution flows should be  
> redone.  For instance, thread-local caching should be leveraged  
> across an execution flow when applicable.  The cost of a local miss  
> should be fairly cheap.

That makes many application level assumptions.  Thread-local caching  
may not be possible or useful - threads may only need a small subset  
of data, but several threads may access this same data and fetching/ 
generating repeatedly may be unnecessary/wasteful.  And a local miss  
is not necessarily cheap if it involves an expensive remote lookup or  
database hit.

> The requirement that the CLHM provides a noticeable performance gain  
> should be treated as giving a buffer to solve the underlying  
> architectural problems.  While it is always nice to get a  
> performance gain, the change should be more theoretical than  
> practical.  Hence, my major motivation for implementing a CLHM was  
> for fun as from a real-world purpose its need should be quite rare.

Essentially, anywhere a cache is needed, some sort or bounded map is  
useful.  (Perhaps you are arguing that caches aren't useful?  :)  And  
for a bounded map to make sense, some form of eviction ordering or  
preference of entries should be maintained.  As I mentioned in my  
response to Doug, this order doesn't need to be precise, but at least  
some form of favour of removing older entries over newer ones should  
be there.

> Cheers,
> Ben
> From: Doug Lea <dl at cs.oswego.edu>
> To: Manik Surtani <manik at jboss.org>
> Cc: concurrency-interest at cs.oswego.edu
> Sent: Tuesday, May 26, 2009 4:45:35 PM
> Subject: Re: [concurrency-interest] A concurrent, linked HashMap impl
> Manik Surtani wrote:
> > I've implemented a concurrent, linked map-like structure [1] [2]
> Sorry for the delay! (I've mostly been playing volunteer sysadmin
> for a bunch of reconfigurations here.)
> We've been considering things like this for the several-times-delayed
> but still upcoming CustomConcurrentHashMap (CCHM) allowing weak/soft
> entries, identity comparisons and (too :-) much more.
> I gather that the main motivation for your approach is to support
> LRU replacement. Considering that many concurrent maps are used
> as caches, this seems to be a common desire. But we've been resisting
> supporting pure LRU because list-based approaches
> force a sequential bottlenecks at the ends of the lists. So even
> a lock-free approach to maintaining an access-ordered list may
> encounter enough contention to negate the advantages of cheap
> get()'s and usually-uncontended modifications. Additionally,
> if you trigger LRU-based removal when hitting a size threshold,
> then you end up calling size() a lot, which is also not cheap.
> All in all, it is not clear when or by how much a concurrent hash
> map approach will have better performance for strict LRU caching
> over a simple synchronized LinkedHashMap.
> There are a few ways to get LRU-like replacement that
> don't impose bottlenecks. Hardware L1 etc caches, facing
> the same issues, tend to use various "pseudo-LRU" (google it)
> techniques that don't have very good software analogs.
> Among the leading candidates is a random sampling approach
> (similar to one in the thesis by Konstantinos Psounisy
> http://www-rcf.usc.edu/~kpsounis/thesis.html) triggered
> by local per-segment threshold overflows. Hopefully
> I'll get something using it ready for preliminary release soon.
> While I'm at it... There are several alternative concurrent
> hash maps in various stages of development, including...
> * Moran Tzafrir's Hopscotch hashing:
> http://groups.google.com/group/hopscotch-hashing
> * Cliff Click's state-based table:
> http://sourceforge.net/projects/high-scale-lib/
> * Amino project cbbs dictionary based on Shalev & Shavit
> http://amino-cbbs.wiki.sourceforge.net/
> The diversity reflects the fact that there are a lot of
> tradeoffs in hash table design. Some of these are probably
> better than j.u.c.ConcurrentHashMap in some contexts, but
> it is still not clear if any of them are or can become
> better in enough contexts to replace the current algorithms.
> -Doug
> >
> > [1]http://fisheye.jboss.org/browse/Infinispan/trunk/core/src/main/java/org/infinispan/container/FIFODataContainer.java?r=236
> > [2]http://fisheye.jboss.org/browse/Infinispan/trunk/core/src/main/java/org/infinispan/container/LRUDataContainer.java?r=219
> > [3] http://www.md.chalmers.se/~tsigas/papers/Lock-Free-Deques-Doubly-Lists-JPDC.pdf
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20090527/0662e4db/attachment-0001.html>

More information about the Concurrency-interest mailing list