[concurrency-interest] A concurrent, linked HashMap impl

Ben Manes ben_manes at yahoo.com
Wed May 27 12:15:18 EDT 2009


> I need to support efficient remove() as well.  And
traversing the linked entry list to locate the node to be removed is
O(n).  In my impl I locate the entry to be removed using the hash table
first, and then unlink.

Since you have been aware of my CLHM for at least 3 months given your previous comments, I'd expect that.  Given that its a LHM, I'd expect that as well (that's kind of the point).  My point is that Doug has an excellent lock-free deque implementation that has undergone performance testing and was considered for JDK-6.  If the JDK was to have a CLHM, it only seems natural that it would be based off of his CHM and CLD.

> Why?  Maintaining a bounded map is not uncommon, and
deciding which entries to evict from such a bounded map usually
requires some sort of order being maintained

Yes, but an application having a critical need for a 2% in VM performance gain is uncommon.



> That makes many application level assumptions.
 Thread-local caching may not be possible or useful - threads may only
need a small subset of data, but several threads may access this same
data and fetching/generating
> repeatedly may be unnecessary/wasteful.
 And a local miss is not necessarily cheap if it involves an expensive
remote lookup or database hit.

The point is that you should use the right tool for the right job.  The wrong thing is to assume that all data should be shared and a faster algorithm solves fundamental design flaws.  If a local cache miss is not cheap then having a faster map doesn't make it much better.  Using a memoizer, ala Ehcache's SelfPopulatingCache (lock striping) or JCiP future-based approach, provides a per-entry lock and allows concurrent cache operations.  In real-world usage there is very little value squeezing the last 2% of in VM performance when the bottleneck is making a remote call (100x+ slower).  This is why a synchronized LHM is good enough in most applications as the miss penalty far outweighs the speedup of faster access on a cache hit.  The few cases where I've seen this not been the case has tended to be due to obvious design flaws and fairly easy to fix.

> Essentially, anywhere a cache is needed, some sort
or bounded map is useful.  (Perhaps you are arguing that caches aren't
useful?  :)  And for a bounded map to make sense, some form of eviction
ordering or preference of entries 
> should be maintained.  As I mentioned
in my response to Doug, this order doesn't need to be precise, but at
least some form of favour of removing older entries over newer ones
should be there.

Your comments give a probably wrong impression that you are starting to believe that a LHM-based cache is original, not already norm, that others haven't solved this problem prior to your attempt, and that a faster version solves a major bottleneck.  The argument has been that when you begin to see that a CLHM is required to solve a noticable performance problem then one needs to take a step back and understand that 99% of the time your doing something wrong.



________________________________
From: Manik Surtani <manik at jboss.org>
To: Ben Manes <ben_manes at yahoo.com>
Cc: Doug Lea <dl at cs.oswego.edu>; concurrency-interest at cs.oswego.edu
Sent: Wednesday, May 27, 2009 3:59:23 AM
Subject: Re: [concurrency-interest] A concurrent, linked HashMap impl



On 27 May 2009, at 01:59, Ben Manes wrote:

Similarly, I have a CLHM implementation.  However I didn't base it off of an existing algorithm as this would not be very educational and my goal was purely for fun and to learn something.  Instead I've slowly been maturing it as I get a better grasp of how to write complex, concurrent algorithms.  I expect to make it lock-free in the next iteration where I'll be trying to understand how to get concurrent operations to work together to an end state rather than relying on mutual exclusion to block operations that could corrupt the shared state.  So my approach is much more oriented towards personal growth than pragmatically implementing an algorithm I don't quite grok.

> pure LRU because list-based approaches force a sequential bottlenecks at the ends of the lists.

For this exact reason I've adopted is the Second Chance algorithm.  The efficiency is similar, but worse, than a pure LRU yet provides the concurrency characteristics of a FIFO.  This allows reads to be as cheap as a CHM, which is the 80% usage of a cache.

> Additionally, if you trigger LRU-based removal when hitting a size threshold, then you end up calling size() a lot, which is also not cheap.

This is why I maintain a the length of the list as a separate variable (AtomicInteger).  The clients are made aware that the size of the cache may temporarily fluctuate above the "maximum capacity" and the size is slightly more variable.  Due to this approach the length variable could, of course, temporarily be negative.  In that case I simply return zero.  Its an acceptable trade-off for the use-case of a cache.

This is useful if there isn't a stringent constraint on the accuracy of size().  Something to consider, good idea.

> The diversity reflects the fact that there are a lot of tradeoffs in hash table design.

Whether to use a hash table is, by itself, a trade-off in usage.  For example, on the Ehcache forum a user reported a performance problem with a cache of 60M entries.  With a 32-bit hashcode, there is a 99% probability of a collision at 200k entries.  With CHM's maximum of 2^16 segments, the average length of the link chain is 915 entries.  The performance problem was most likely due to traversing this list on every operation.  While I would argue that 60M entries is an abusive use of the data structure, it shows that CHM is not always the appropriate data store.  If instead the data was stored in a ConcurrentSkipListMap then the height (# of comparisons) is a mere 26.  There are other ways to solve it, but it shows nicely how you must chose your data structures wisely.

On a side-note, I would argue two points in design.  The first is that for a JDK version this should be based off of Doug's ConcurrentLinkedDeque.

I need to support efficient remove() as well.  And traversing the linked entry list to locate the node to be removed is O(n).  In my impl I locate the entry to be removed using the hash table first, and then unlink.

  The second is that from an application's architecture the need for a CLHM indicates a fundamental flaw in its design.  A JVM-level cache under such heavy load to require such a map is a serious flaw in the usage pattern.
Why?  Maintaining a bounded map is not uncommon, and deciding which entries to evict from such a bounded map usually requires some sort of order being maintained. 


Instead the topology of caches and execution flows should be redone.  For instance, thread-local caching should be leveraged across an execution flow when applicable.  The cost of a local miss should be fairly cheap.

That makes many application level assumptions.  Thread-local caching may not be possible or useful - threads may only need a small subset of data, but several threads may access this same data and fetching/generating repeatedly may be unnecessary/wasteful.  And a local miss is not necessarily cheap if it involves an expensive remote lookup or database hit.

The requirement that the CLHM provides a noticeable performance gain should be treated as giving a buffer to solve the underlying architectural problems.  While it is always nice to get a performance gain, the change should be more theoretical than practical.  Hence, my major motivation for implementing a CLHM was for fun as from a real-world purpose its need should be quite rare.

Essentially, anywhere a cache is needed, some sort or bounded map is useful.  (Perhaps you are arguing that caches aren't useful?  :)  And for a bounded map to make sense, some form of eviction ordering or preference of entries should be maintained.  As I mentioned in my response to Doug, this order doesn't need to be precise, but at least some form of favour of removing older entries over newer ones should be there.



Cheers,
Ben





________________________________
From: Doug Lea <dl at cs.oswego.edu>
To: Manik Surtani <manik at jboss.org>
Cc: concurrency-interest at cs.oswego.edu
Sent: Tuesday, May 26, 2009 4:45:35 PM
Subject: Re: [concurrency-interest] A concurrent, linked HashMap impl

Manik Surtani wrote:
> I've implemented a concurrent, linked map-like structure [1] [2] 

Sorry for the delay! (I've mostly been playing volunteer sysadmin
for a bunch of reconfigurations here.)

We've been considering things like this for the several-times-delayed
but still upcoming CustomConcurrentHashMap (CCHM) allowing weak/soft
entries, identity comparisons and (too :-) much more.

I gather that the main motivation for your approach is to support
LRU replacement. Considering that many concurrent maps are used
as caches, this seems to be a common desire. But we've been resisting
supporting pure LRU because list-based approaches
force a sequential bottlenecks at the ends of the lists. So even
a lock-free approach to maintaining an access-ordered list may
encounter enough contention to negate the advantages of cheap
get()'s and usually-uncontended modifications. Additionally,
if you trigger LRU-based removal when hitting a size threshold,
then you end up calling size() a lot, which is also not cheap.
All in all, it is not clear when or by how much a concurrent hash
map approach will have better performance for strict LRU caching
over a simple synchronized LinkedHashMap.

There are a few ways to get LRU-like replacement that
don't impose bottlenecks. Hardware L1 etc caches, facing
the same issues, tend to use various "pseudo-LRU" (google it)
techniques that don't have very good software analogs.
Among the leading candidates is a random sampling approach
(similar to one in the thesis by Konstantinos Psounisy
http://www-rcf.usc.edu/%7Ekpsounis/thesis.html) triggered
by local per-segment threshold overflows. Hopefully
I'll get something using it ready for preliminary release soon.

While I'm at it... There are several alternative concurrent
hash maps in various stages of development, including...

* Moran Tzafrir's Hopscotch hashing:
http://groups.google.com/group/hopscotch-hashing

* Cliff Click's state-based table:
http://sourceforge.net/projects/high-scale-lib/

* Amino project cbbs dictionary based on Shalev & Shavit
http://amino-cbbs.wiki.sourceforge.net/

The diversity reflects the fact that there are a lot of
tradeoffs in hash table design. Some of these are probably
better than j.u.c.ConcurrentHashMap in some contexts, but
it is still not clear if any of them are or can become
better in enough contexts to replace the current algorithms.

-Doug

> 
> [1]http://fisheye.jboss.org/browse/Infinispan/trunk/core/src/main/java/org/infinispan/container/FIFODataContainer.java?r=236 
> [2]http://fisheye.jboss.org/browse/Infinispan/trunk/core/src/main/java/org/infinispan/container/LRUDataContainer.java?r=219 
> [3] http://www.md.chalmers.se/%7Etsigas/papers/Lock-Free-Deques-Doubly-Lists-JPDC.pdf 
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest at cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org


      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20090527/ccf459d7/attachment-0001.html>


More information about the Concurrency-interest mailing list