[concurrency-interest] A concurrent, linked HashMap impl

Jim Andreou jim.andreou at gmail.com
Fri May 29 05:19:16 EDT 2009


Yes, it's a link chain of the elements hashed _to the same bucket_.

The default load factor is 0.75. That's the average number of elements
that each bucket contains.

It would take a really, really bad hashCode() impl to get 915 elements
in a single bucket chain.

Btw, I see this comment in the implementation:
>(The average length [of the bucket chain] is less than two for the default load factor threshold.)

I'm not sure why it says "less than two" and not "less than one".
While the former is certainly true, it isn't the strictest possible,
unless what I say is inaccurate (load factor == avg. bucket chain
length) at least for this hash table implementation.

Dimitris

2009/5/29 Ben Manes <ben_manes at yahoo.com>:
> I may be wrong in my reading of how CHM works, so it could be a
> misunderstanding on my part.  If you look at the implementation it does walk
> a link chain.
>
> static final class Segment<K,V> extends ReentrantLock implements
> Serializable {
>     // ...
>     V get(Object key, int hash) {
>         if (count != 0) { // read-volatile
>             HashEntry<K,V> e = getFirst(hash);
>             while (e != null) {
>                 if (e.hash == hash && key.equals(e.key)) {
>                     V v = e.value;
>                     if (v != null)
>                         return v;
>                     return readValueUnderLock(e); // recheck
>                 }
>                 e = e.next;
>             }
>         }
>         return null;
>     }
>
> In regards to collisions and the number of segments, a segment is chosen by:
>
>     final Segment<K,V> segmentFor(int hash) {
>         return (Segment<K,V>) segments[(hash >>> segmentShift) &
> segmentMask];
>     }
>
> The number of segments to chose from, the likelihood of a hash collision,
> and the number of elements to store should all effect the number of entries
> in a segment.  In the 60M entries example this could result is ~900
> comparisons.
>
> Of course, the experts are right here and can correct me if I'm wrong. :-)
>
> Cheers!
> Ben
>
>
> ________________________________
> From: Jim Andreou <jim.andreou at gmail.com>
> To: Ben Manes <ben_manes at yahoo.com>
> Cc: Doug Lea <dl at cs.oswego.edu>; Manik Surtani <manik at jboss.org>;
> concurrency-interest at cs.oswego.edu
> Sent: Thursday, May 28, 2009 4:40:46 PM
> Subject: Re: [concurrency-interest] A concurrent, linked HashMap impl
>
> I'm not sure I'm reading this correctly, but...
>
> 2009/5/27 Ben Manes <ben_manes at yahoo.com>:
>>
>> Whether to use a hash table is, by itself, a trade-off in usage.  For
>> example, on the Ehcache forum a user reported a performance problem with a
>> cache of 60M entries.  With a 32-bit hashcode, there is a 99% probability
>> of
>> a collision at 200k entries.  With CHM's maximum of 2^16 segments, the
>> average length of the link chain is 915 entries.
>
> If you have 2^16 segments, then indeed each segment averages to 915
> entries. But a segment itself is a hash table, not a single link
> chain. (Unless all 915 entries would conflict! Kind of unlikely...).
> In any case, I fail to see how the number of segments affect the
> number of collisions.
>
> Do I miss something here?
>
>> The performance problem
>> was most likely due to traversing this list on every operation.  While I
>> would argue that 60M entries is an abusive use of the data structure, it
>> shows that CHM is not always the appropriate data store.  If instead the
>> data was stored in a ConcurrentSkipListMap then the height (# of
>> comparisons) is a mere 26.  There are other ways to solve it, but it shows
>> nicely how you must chose your data structures wisely.
>
> Given the above, I think that a CSLM would actually be slower than
> CHM, not faster.
>
> Dimitris
>
>



More information about the Concurrency-interest mailing list