[concurrency-interest] A concurrent, linked HashMap impl
Ben Manes
ben_manes at yahoo.com
Thu May 28 21:14:28 EDT 2009
I may be wrong in my reading of how CHM works, so it could be a misunderstanding on my part. If you look at the implementation it does walk a link chain.
static final class Segment<K,V> extends ReentrantLock implements Serializable {
// ...
V get(Object key, int hash) {
if (count != 0) { // read-volatile
HashEntry<K,V> e = getFirst(hash);
while (e != null) {
if (e.hash == hash && key.equals(e.key)) {
V v = e.value;
if (v != null)
return v;
return readValueUnderLock(e); // recheck
}
e = e.next;
}
}
return null;
}
In regards to collisions and the number of segments, a segment is chosen by:
final Segment<K,V> segmentFor(int hash) {
return (Segment<K,V>) segments[(hash >>> segmentShift) & segmentMask];
}
The number of segments to chose from, the likelihood of a hash collision, and the number of elements to store should all effect the number of entries in a segment. In the 60M entries example this could result is ~900 comparisons.
Of course, the experts are right here and can correct me if I'm wrong. :-)
Cheers!
Ben
________________________________
From: Jim Andreou <jim.andreou at gmail.com>
To: Ben Manes <ben_manes at yahoo.com>
Cc: Doug Lea <dl at cs.oswego.edu>; Manik Surtani <manik at jboss.org>; concurrency-interest at cs.oswego.edu
Sent: Thursday, May 28, 2009 4:40:46 PM
Subject: Re: [concurrency-interest] A concurrent, linked HashMap impl
I'm not sure I'm reading this correctly, but...
2009/5/27 Ben Manes <ben_manes at yahoo.com>:
>
> Whether to use a hash table is, by itself, a trade-off in usage. For
> example, on the Ehcache forum a user reported a performance problem with a
> cache of 60M entries. With a 32-bit hashcode, there is a 99% probability of
> a collision at 200k entries. With CHM's maximum of 2^16 segments, the
> average length of the link chain is 915 entries.
If you have 2^16 segments, then indeed each segment averages to 915
entries. But a segment itself is a hash table, not a single link
chain. (Unless all 915 entries would conflict! Kind of unlikely...).
In any case, I fail to see how the number of segments affect the
number of collisions.
Do I miss something here?
> The performance problem
> was most likely due to traversing this list on every operation. While I
> would argue that 60M entries is an abusive use of the data structure, it
> shows that CHM is not always the appropriate data store. If instead the
> data was stored in a ConcurrentSkipListMap then the height (# of
> comparisons) is a mere 26. There are other ways to solve it, but it shows
> nicely how you must chose your data structures wisely.
Given the above, I think that a CSLM would actually be slower than
CHM, not faster.
Dimitris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20090528/88e09cc5/attachment-0001.html>
More information about the Concurrency-interest
mailing list