[concurrency-interest] A concurrent, linked HashMap impl

Ben Manes ben_manes at yahoo.com
Thu May 28 21:14:28 EDT 2009


I may be wrong in my reading of how CHM works, so it could be a misunderstanding on my part.  If you look at the implementation it does walk a link chain.

static final class Segment<K,V> extends ReentrantLock implements Serializable {
    // ...
    V get(Object key, int hash) {
        if (count != 0) { // read-volatile
            HashEntry<K,V> e = getFirst(hash);
            while (e != null) {
                if (e.hash == hash && key.equals(e.key)) {
                    V v = e.value;
                    if (v != null)
                        return v;
                    return readValueUnderLock(e); // recheck
                }
                e = e.next;
            }
        }
        return null;
    }

In regards to collisions and the number of segments, a segment is chosen by:

    final Segment<K,V> segmentFor(int hash) {
        return (Segment<K,V>) segments[(hash >>> segmentShift) & segmentMask];
    }

The number of segments to chose from, the likelihood of a hash collision, and the number of elements to store should all effect the number of entries in a segment.  In the 60M entries example this could result is ~900 comparisons.

Of course, the experts are right here and can correct me if I'm wrong. :-)

Cheers!
Ben





________________________________
From: Jim Andreou <jim.andreou at gmail.com>
To: Ben Manes <ben_manes at yahoo.com>
Cc: Doug Lea <dl at cs.oswego.edu>; Manik Surtani <manik at jboss.org>; concurrency-interest at cs.oswego.edu
Sent: Thursday, May 28, 2009 4:40:46 PM
Subject: Re: [concurrency-interest] A concurrent, linked HashMap impl

I'm not sure I'm reading this correctly, but...

2009/5/27 Ben Manes <ben_manes at yahoo.com>:
>
> Whether to use a hash table is, by itself, a trade-off in usage.  For
> example, on the Ehcache forum a user reported a performance problem with a
> cache of 60M entries.  With a 32-bit hashcode, there is a 99% probability of
> a collision at 200k entries.  With CHM's maximum of 2^16 segments, the
> average length of the link chain is 915 entries.

If you have 2^16 segments, then indeed each segment averages to 915
entries. But a segment itself is a hash table, not a single link
chain. (Unless all 915 entries would conflict! Kind of unlikely...).
In any case, I fail to see how the number of segments affect the
number of collisions.

Do I miss something here?

> The performance problem
> was most likely due to traversing this list on every operation.  While I
> would argue that 60M entries is an abusive use of the data structure, it
> shows that CHM is not always the appropriate data store.  If instead the
> data was stored in a ConcurrentSkipListMap then the height (# of
> comparisons) is a mere 26.  There are other ways to solve it, but it shows
> nicely how you must chose your data structures wisely.

Given the above, I think that a CSLM would actually be slower than
CHM, not faster.

Dimitris



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20090528/88e09cc5/attachment-0001.html>


More information about the Concurrency-interest mailing list