[concurrency-interest] HashSet performance

Mark Thornton mthornton at optrak.co.uk
Tue Jun 10 03:48:24 EDT 2008

Osvaldo Pinali Doederlein wrote:
> This remembers me from an old pet peeve, the fact that java.util.HashSet 
> is implemented as a wrapper over java.util.HashMap. This sucks 
> performance-wise, due to the additional indirections everywhere, and 
> even more important, because every entry single object is larger than 
> necessary - it contains a key and a value fields, but a single field 
> would be necessary for HashSet. If you have a Set with millions of 
> entries, the overhead of one useless pointer per entry is significant. 

When I have millions of entries I usually use an implementation with 
open addressing and double hashing. This eliminates all the Entry 
objects and thus saves a LOT more space. It is tempting to implement 
Map's in the same way, except for the need to present Map.Entry objects 
in the interface. I haven't compared performance for a while, but when I 
did there wasn't much difference in time (those indirections aren't 
particularly expensive). Space use is reduced and for large maps this 
has an indirect impact on speed.

Mark Thornton

More information about the Concurrency-interest mailing list