[concurrency-interest] A concurrent, linked HashMap impl

Jim Andreou jim.andreou at gmail.com
Thu May 28 19:40:46 EDT 2009


I'm not sure I'm reading this correctly, but...

2009/5/27 Ben Manes <ben_manes at yahoo.com>:
>
> Whether to use a hash table is, by itself, a trade-off in usage.  For
> example, on the Ehcache forum a user reported a performance problem with a
> cache of 60M entries.  With a 32-bit hashcode, there is a 99% probability of
> a collision at 200k entries.  With CHM's maximum of 2^16 segments, the
> average length of the link chain is 915 entries.

If you have 2^16 segments, then indeed each segment averages to 915
entries. But a segment itself is a hash table, not a single link
chain. (Unless all 915 entries would conflict! Kind of unlikely...).
In any case, I fail to see how the number of segments affect the
number of collisions.

Do I miss something here?

> The performance problem
> was most likely due to traversing this list on every operation.  While I
> would argue that 60M entries is an abusive use of the data structure, it
> shows that CHM is not always the appropriate data store.  If instead the
> data was stored in a ConcurrentSkipListMap then the height (# of
> comparisons) is a mere 26.  There are other ways to solve it, but it shows
> nicely how you must chose your data structures wisely.

Given the above, I think that a CSLM would actually be slower than
CHM, not faster.

Dimitris



More information about the Concurrency-interest mailing list